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Introduction 



The probabilistic theory of coalescence, which is the primary subject 
of these notes, has expanded at a quick pace over the last decade or 
so. I can think of three factors which have essentially contributed 
to this growth. On the one hand, there has been a rising demand 
from population geneticists to develop and analyse models which 
incorporate more realistic features than what Kingman's coalescent 
allows for. Simultaneously, the field has matured enough that a wide 
range of techniques from modern probability theory may be success- 
fully applied to these questions. These tools include for instance 
martingale methods, renormalization and random walk arguments, 
combinatorial embeddings, sample path analysis of Brownian motion 
and Levy processes, and, last but not least, continuum random trees 
and measure- valued processes. Finally, coalescent processes arise in 
a natural way from spin glass models of statistical physics. The 
identification of the Bolthausen-Sznitman coalescent as a universal 
scaling limit in those models, and the connection made by Brunet 
and Derrida to models of population genetics, is a very exciting re- 
cent development. 

The purpose of these notes is to give a quick introduction to the 
mathematical aspects of these various ideas, and to the biological 
motivations underlying them. We have tried to make these notes 
as self-contained as possible, but within the limits imposed by the 
desire to make them short and keep them accessible. Of course, the 
price to pay for this is a lack of mathematical rigour. Often we skip 
the technical parts of arguments, and instead focus on some of the 
key ideas that go into the proof. The level of mathematical prepa- 
ration required to read these notes is roughly that of two courses in 
probability theory. Thus we will assume that the reader is familiar 
with such notions as Poisson point processes and Brownian motion. 

Sadly, several important and beautiful topics are not discussed. 
The most obvious such topics are the Marcus-Lushnikov processes 
and their relation to the Smoluchowski equations, as well as works on 
simultaneous multiple collisions. Also not appearing in these notes 
is the large body of work on random fragmentation. For all these 
and further omissions, I apologise in advance. 

A first draft of these notes was prepared for a set of lectures at 



IMPA in January 2009. Many thanks to Vladas Sidoravicius and 
Maria Eulalia Vares for their invitation, and to Vladas in particular 
for arranging many details of the trip. I lectured again on this mate- 
rial at Eurandom on the occasion of the conference Young European 
Probabilists in March 2009. Thanks to Julien Berestycki and Peter 
Morters for organizing this meeting and for their invitation. I also 
want to thank Charline Smadi-Lasserre for a careful reading of an 
early draft of these notes. 

Many thanks to the people with whom I learnt about coalescent 
processes: first and foremost, my brother Julien, and to my other col- 
laborators on this topic: Alison Etheridge, Vlada Limic, and Jason 
Schweinsberg. Thanks are due to Rick Durrett and Jean-Francois 
Le Gall for triggering my interest in this area while I was their PhD 
students. 
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1 Random exchangeable partitions 



This chapter introduces the reader to the theory of exchangeable ran- 
dom partitions, which is a basic building block of coalescent theory. 
This theory is essentially due to Kingman; the basic result (essen- 
tially a variation on De Finetti's theorem) allows one to think of a 
random partition alternatively as a discrete object, taking values in 
the set V of partitions ofN = {1,2,...,}, or a continuous object, 
taking values in the set Sq of tilings of the unit interval (0,1). These 
two points of view are strictly equivalent, which contributes to make 
the theory quite elegant: sometimes, a property is better expressed 
on a random partition viewed as a partition of N, and sometimes it 
is better viewed as a property of partitions of the unit interval. We 
then take a look at a classical example of random partitions known 
as the Poisson-Dirichlet family, which, as we partly show, arises in a 
huge variety of contexts. We then present some recent results that 
can be labelled as "Tauberian theory", which takes a particularly 
elegant form here. 

1.1 Definitions and basic results 

We first fix some vocabulary and notation. A partition vr of N is 
an equivalence relation on N. The blocks of the partition are the 
equivalence classes of this relation. We will sometime write i ~ j or 
i ~^ j to denote that i and j are in the same block of vr. Unless 
otherwise specified, the blocks of vr will be listed in the increasing 
order of their least elements: thus, Bi is the block containing 1, B2 
is the block containing the smallest element not in Si, and so on. 
The space of partitions of N is denoted by V. There is a natural 
distance on the space V, which is to take d{'K, vr') to be equal to 1 
over the largest n such that the restriction of vr and vr' to {1, . . . , n} 
are identical. Equipped with this distance, P is a Polish space. This 
is useful when speaking about random partitions, so that we can talk 
about convergence in distribution, conditional distribution, etc. We 
also let [n] = {1, . . . , n} and Vn be the space of partitions of [n]. 
Given a partition vr = (i?i, B2, ■ ■ ■) and a block B of that partition. 
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we denote by \B\, the quantity, if it exists: 

Card(iin[n])^ 

n^oo n 

|i?| is called the asymptotic frequency of the block B, and is a mea- 
sure of its relative size; for this reason we will often refer to it as 
its mass. For instance, if vr is the partition of N into odd and even 
integers, there are two blocks, each with mass 1/2. The following 
definition is key to what follows. If cr is a permutation of N with 
finite support (i.e., it actually permutes only finitely may points), 
and n is a partition, then one can define a new partition Ho- by ex- 
changing the labels of integers according to a. That is, i,j are in the 
same block of 11, if and only if a{i) and a{j) are in the same block 

of n,. 

Definition 1.1. An exchangeable random partition IT is a random 
element of V whose law is invariant under the action of any permu- 
tation 0" o/ N with finite support: that is, H and Ho- have the same 
distribution for all a. 

To put things into words, an exchangeable random partition is a 
partition which ignores the label of a particular integer. This sug- 
gests that exchangeable random partitions are only relevant when 
working under mean-field assumptions. However, this is slightly mis- 
leading. For instance, if one looks at the random partition obtained 
by first enumerating all vertices of Z'^ {vi,V2, , ■ ■ ■) in some arbitrary 
order, and then say that i and j are in the same block of n(a;) if and 
only if Vi and Vj are in the same connected component in a realisa- 
tion u> of bond percolation on Z'^ with parameter < p < 1, then the 
resulting random partition is not exchangeable. On the other hand, 
if (Vi, V2, . . .) are independent random vertices chosen according to 
some given distribution on Z"^, then the random partition defined 
by putting i and j in the same block if Vi and Vj are in the same 
connected component, is exchangeable. Indeed, in these notes we 
will later see several examples where random partitions arise from a 
nontrivial spatial structure. 

Kingman's theorem, which is the main result of this section, starts 
with the observation that given a tiling of the unit interval, there is 
always a neat way to generate an exchangeable random partition 
associated with this tiling. To be formal, let Sq be the space of 
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tilings of the unit interval (0, 1), that is, sequences s = (sq, si, . . .) 
with si > S2 ^ • • • ^ and X^^g ~ ^ (note that we do not require 
So > si): 



The coordinate sq plays a special role in this sequence and this is 
why monotonicity is only required starting at i = 1 in this definition. 
An element of Sq may be viewed as a tiling of (0,1), where the sizes 
of the tiles are precisely equal to sq, si, . . . the ordering of the tiles 
is irrelevant for now, but for the sake of simplicity we will order 
them from left to right: the first tile is Jq = (0,,so), the second is 
Ji = (sqi so + si), etc. Let s S So, and let Ui, U2, ... be i.i.d. uniform 
random variables on (0, 1). For < ti < 1 let I{u) S {0, 1, . . .} denote 
the index of the component (tile) of s which contains u. That is. 



Let n be the random partition defined by saying z ~ j if and only 
if I{Ui) = I{Uj) > or i = j (see Figure 1). Note that in this 
construction, if Ui falls into the 0*^ part of s, then i is guaranteed to 
form a singleton in the partition H. On the other hand, if I{Ui) > 
1, then almost surely, the block containing i has infinitely many 
members, and in fact, by the law of large numbers, the frequency of 
this block is well defined and strictly positive. For this reason, the 
part So of s is referred to as the dust of s. We will say that H has 
no dust if So = 0, i.e., if H has no singleton. 

The partition H described by the above construction gives us an 
exchangeable partition, as the law of (Ui, . . . , C/„) is the same as that 
of (^7o-(i)) • • • ) f^-{n)) foi' each n > 1 and for each permutation a with 
support in [n]. 

Definition 1.2. H is the paintbox partition derived from s. 

The name paintbox refers to the fact that each part of s defines a 
colour, and we paint i with the colour in which Ui falls. If Ui falls 
in So, then we paint i with a unique, new, colour. The partition U 
is then obtained from identifying integers with the same colour. 
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Figure 1: The paintbox process associates a random parti- 
tion n to any tiling of the unit interval. Here n|[8] = 
({1, 4}, {2}, {3, 7}, {5}, {6}, {8}). Note how 2 and 6 form singletons. 

Note that this construction still gives an exchangeable random 
partition if s is a random element of Sq, provided that the sequence 
Ui is chosen independently from s. Kingman's theorem states that 
this is the most general form of exchangeable random partition. For 
s € So, let ps denote the law on "P of a paintbox partition derived 
from s. 

Theorem 1.1. (Kingman [107]) Let U be any exchangeable random 
partition. Then there exists a probability distribution p{ds) on Sq 
such that 

P(ne-)= / i^{ds)ps{-). 

Sketch of proof. We briefly sketch Aldous' proof of this result [2], 
which relies on De Finetti's theorem on exchangeable sequences of 
random variables. This theorem states the following: if {Xi, . . .) is 
an infinite exchangeable sequence of real-valued random variables 
(i.e., its law is invariant under the permutation of finitely many in- 
dices), then there exists a random probability measure p such that, 
conditionally given p, the Xj's are i.i.d. with law p. Now, let 11 be an 
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exchangeable partition. Define a random map : N — > N as follows: 
if i G N, then ip{i) is the smallest integer in the same block as i. Thus 
the blocks of the partition 11 may be regarded as the sets of points 
which share a common value under the map (f. In parallel, take an 
independent sequence of i.i.d. uniform random variables {Ui, . . .) on 
[0, 1], and define Xi = U^pt^iy It is immediate that {Xi, . . .) are ex- 
changeable, and so De Finetti's theorem applies. Thus there exists 
such that, conditionally given /x, (Xi, . . .) is i.i.d. with law /x. Note 
that i and j are in the same block of 11 if and only if Xi = Xj . We 
now work conditionally given /i. Note that (Xi,...) has the same 
law as {q{Vi), ■ ■ ■), where (Vi, . . .) are i.i.d. uniform on [0, 1], and for 
j; E M, q{x) = inf{y S M : F{y) > x} and F{x) denotes the cumula- 
tive distribution function of Thus we deduce that 11 has the same 
law as the paintbox Ps{-), where s = (so,si, . . .) G 5o is such that 
(si, . . .) gives the ordered list of atoms of /.i and sq = 1 — Yl'^i ^i- ^ 

We note that Kingman's original proof relies on a martingale ar- 
gument, which is in line with the modern proofs of De Finetti's 
theorem (see, e.g., Durrett [65], (6.6) in Chapter 4). The interested 
reader is referred to [2] and [133], both of which contain a wealth of 
information about the subject. 

This theorem has several interesting and immediate consequences: 
if n is any exchangeable random partition, then the only finite blocks 
of n are the singletons, almost surely. Indeed if a block is not a 
singleton, then it is infinite and has in fact positive, well-defined 
asymptotic frequency (or mass), by the law of large numbers. The 
(random) vector s ^ Sq can be entirely recovered from 11: if 11 
has any singleton at all, then a positive proportion of integers are 
singletons, that proportion is equal to sq. Moreover, (si, . . .) is the 
ordered sequence of nondecreasing block masses. In particular, if 
n = (Si,...,) then 

l-Bil + I-B2I + . . . = 1 - So, a.s. 

There is thus a complete correspondence between the random ex- 
changeable partition 11 and the sequence s ^ Sq: 

UeV < — > s G Sq. 

Corollary 1.1. This correspondence is a 1-1 map between the law 
of exchangeable random partitions 11 and distributions fi on Sq. This 
map is Kingman's correspondence. 
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Furthermore, this correspondence is continuous when Sq is equipped 
with the appropriate topology: this is the topology associated with 
pointwise convergence of the "non-dust" entries: that is, ^ s as 
e — > if and only if, sf si, . . . , s| — > Sk, for all /c > 1 (but not 
necessarily for k = 0). 

Theorem 1.2. Convergence in distribution of the random partitions 
{Ile)e>o, is equivalent to the convergence in distributions of their 
ranked frequencies (sf , s^, • • •)£>o- 

The proof is easy and can be found for instance in Pitman [133], 
Theorem 2.3. It is easy to see that the correspondence can not be 
continuous with respect to the restriction of the £^ metric to Sq (think 
about a state with many blocks of small but positive frequencies and 
no dust: this is "close" to the pure dust state from the point of 
view of pointwise convergence, and hence from the point of view of 
sampling, but not at all from the point of view of the £^ metric). 

1.2 Size-biased picking 
1.2.1 Single pick 

When given an exchangeable random partition IT, it is natural to ask 
what is the mass of a "typical" block. If 11 has only a finite number of 
blocks, one can choose a block uniformly at random among all blocks 
present. But when there is an infinite number of blocks, it is not 
possible to do so. In that case, one may instead consider the block 
containing a given integer, say 1. The partition being exchangeable, 
this block may indeed be thought of being a generic or typical block, 
and the advantage is that this is possible both when there are finitely 
or infinitely many blocks. Its mass is then (slightly) larger than that 
of a typical block. When there are only a finite number of blocks, this 
is expressed as follows. Let X be the mass of the block containing 
1, and let Y be the mass of a randomly chosen block of the random 
exchangeable partition 11. Then the reader can easily verify that 

F{X e dx) = ^F{Y e dx), x>0. (2) 

If a pair of random variables {X, Y) satisfies the relation (2) we say 
that X has the size-biased distribution of Y. For this reason, here 
we say that X is the mass of a size-biased picked block. 
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In terms of the Kingman's correspondence, X has a natural in- 
terpretation when there is no dust. In that case, if 11 is viewed as 
a random unit partition s G 5o, then X is also the length of the 
segment containing a point uniformly chosen at random on the unit 
interval. 

Not surprisingly, many of the properties of 11 can be read from the 
sole distribution of X. (Needless to say though, the law of X does 
not characterize fully that of 11). 

Theorem 1.3. Let 11 6e a random exchangeable partition with ranked 
frequencies (Pj)j>i. Assume that there is no dust almost surely, and 
let f be any nonnegative function. Then: 



where /i is the law of the mass of a size-biased picked block X . 

Proof. The proof follows from looking at the function g{x) = f{x) /x, 
and observing that '&{g[X)) = ^{^iPig{Pi)), which itself is a con- 
sequence of Kingman's correspondence, since the Pi are simply equal 
to the coordinates (si, . . .) of the sequence s £ Sq, and Ui falls in 
each of them with probability Sj. □ 

Thus, from this it follows that the n^^ moment of X is related to 
the sum of the (n -|- l)**^ moments of all frequencies: 



This identity is obvious when one realises that both sides of this 
equation can be interpreted as the probability that two randomly 
chosen points fall in the same component. This of course also applies 
to (4), which is the probability that n + 1 randomly chosen points are 
in the same component. The following identity is a useful application 
of Theorem 1.3: 




(3) 





In particular, for n = 1 we have: 
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Theorem 1.4. Let H be a random exchangeable partition, and let 
N be the number of blocks ofH. Then we have the formula: 

E{N) = K{1/X). 

To explain the result, note that if we see that the block containing 
1 has frequency e > small, then we can expect roughly 1/e blocks 
in total (since that would be the answer if all blocks had frequency 
exactly e). 

Proof. To see this, note that the result is obvious if 11 has some dust 
with positive probability, as both sides are then infinite. So assume 
that n has no dust almost surely, and let A^^^ be the number of blocks 
of n restricted to [n]. Then by Theorem 1.3: 

K[N„) = ^^P(part i is chosen among the first n picks) 

i 

= ^E(i-(i-p,r) 

i 

= nfnix)), 



say, where 



fn{x) = - 



X 

Letting n oo, since X > almost surely because there is no dust, 
fn{X) 1/^ almost surely. This convergence is also monotone, so 
we conclude 

E{N) = E{l/X) 

as required. □ 

Theorem 1.4 will often guide our intuition when studying the 
small-time behaviour of coalescent processes that come down from 
infinity (rigorous definitions will be given shortly). Basically, this is 
the study of the coalescent processes close to the time at which they 
experience a "big-bang" event, going from a state of pure dust to a 
state made of finitely many solid blocks (i.e., with positive mass). 
Close to this time, we have a very large number of small blocks. Any 
information on N can then be hoped to carry onto X, and conversely. 
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1.2.2 Multiple picks, size-biased ordering 

Let X = Xi denote the mass of a size-biased picked block. One 
can then define further statistics which refine our description of 11. 
Recall that if 11 = B2, ■ ■ ■) with blocks ordered according to their 
least elements, then X\ = \Bi\ is by definition the mass of a size- 
biased picked block. Define similarly, X2 = I-B2I) • • • > = \Bn\) and 
so on. Then (Xi, . . .) corresponds to sampling without replacement 
the possible blocks of 11, with a size bias at every step. 

Note that if 11 has no dust, then (Xi, . . . , ) is just a reordering of 
the sequence (si, . . . , ) where s denotes the ranked frequencies of H, 
or equivalently the image of 11 by Kingman's correspondence. That 
is, there exists a permutation o" : N — > N such that 

= ScT{i), i > 1- 

This permutation is the size-biased ordering of s. It satisfies: 

F{a{l) = j\s) = sj 
Moreover, given s, and given (t(1), . . . , a{i — 1), we have: 
F{a{i)=j\s,a{l),...,a{i-1)) 



Although slightly more complicated, the size-biased ordering of s, 
{Xi ,...), is often more natural than the nondecreasing rearrange- 
ment which defines s. 

As an exercise, the reader is invited to verify that Theorem 1.4 can 
be generalised to this setup to yield: if N is the number of ordered 
/c-uplets of distinct blocks in the random exchangeable partition 11, 
then 

1 



This is potentially useful to establish limit theorems for the distri- 
bution of the number of blocks in a coalescent, but this possibility 
has not been explored to this date. 

1.3 The Poisson-Dirichlet random partition 

We are now going to spend some time to describe a particular family 
of random partitions called the Poisson-Dirichlet partitions. These 
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partitions are ubiquitous in this field, playing the role of the normal 
random variable in standard probability theory. Hence they arise in 
a huge variety of contexts: not only coalescence and population ge- 
netics (which is our main reason to talk about them in these notes), 
but also random permutations, number theory [62], Brownian mo- 
tion [133], spin glass models [40], random surfaces [86]... In its most 
general incarnation, this is a two parameter family of random par- 
titions, and the parameters are usually denoted by {a, 6). However, 
the most interesting cases occur when either a = or 6 = 0, and so 
to keep these notes as simple as possible we will restrict our presen- 
tation to those two cases. 



1.3.1 Case a = 0. 

We start with the case a = 0,9 > 0. We recall that a random 
variable X has the Beta{a, b) distribution (where a,b > 0) if the 
density at x is: 

F(X £ dx) T(a + b) . , . , 

Thus the Beta(l, 9) distribution {9 > 0) is the distribution on (0, 1) 
with density 9{1 — x)^"^ and this is uniform ii 9 = 1. If a, 6 S N the 
Beta(a, b) distribution has the following interpretation: take a + b 
independent standard exponential random variables, and consider 
the ratio of the sum of the first a of them compared to the total 
sum. Alternatively, drop a + b random points in the unit interval 
and order them increasingly. Then the position of the a^^ point is a 
Beta(a, b) random variable. 

Definition 1.3. (Stick-breaking construction, a = 0.) The Poisson- 
Dirichlet random partition is the paintbox partition associated with 
the nonincreasing reordering of the sequence 

Pi = Wi, 

P2 = (1 - Pl)W2, 



= (1 - Pi - . . . - Pn)Wn, 

where the Wi are i.i.d. random variables 

Wi = Beta{l,9). 



(7) 
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We write U ~ PD{0,e). 

To explain the name of this construction, imagine we start with 
a stick of unit length. Then we break the stick in two pieces, Wi 
and 1 — Wi. One of these two pieces (VFi), we put aside and will 
never touch again. To the other, we apply the previous construction 
repeatedly, each time breaking off a piece which is Beta-distributed 
on the current length of the stick. In particular, note that when 
6 = 1, the pieces are uniformly distributed. 

While the above construction tells us what the asymptotic fre- 
quencies of the blocks are, there is a much more visual and appealing 
way of describing this partition, which goes by the name of "Chinese 
restaurant process". Let n„ be the partition of [n] defined induc- 
tively as follows: initially, Hi is the just the trivial partition {{1}}. 
Given n„, we build n„+i as follows. The restriction of n„+i to [n] 
will be exactly n„, hence it suffices to assign a block to n + 1. With 
probability 6/{n + 6), n + 1 starts a new block. Otherwise, n + 1 is 
assigned to a block of size m with probability m/{n + 6). This can 
be summarized as follows: 



This defines a (consistent) family of partitions n„, hence there is 
no problem in extending this definition to a random partition 11 of 
V such that n|[„] = n„ for all n > 1: indeed, ii i,j > 1, it suffices 
to say whether i ~ j or not, and in order to be able to decide this, 
it suffices to check on n„ where n = max(i,j). This procedure thus 
uniquely specifies 11. 

The name "Chinese Restaurant Process" comes from the following 
interpretation in the case = 1: customers arrive one by one in 
an empty restaurant which has round tables. Initially, customer 1 
sits by himself. When the {n + l)*'^ customer arrives, she chooses 
uniformly at random between sitting at a new table or sitting directly 
to the right of a given individual. The partition structure obtained 
by identifying individuals sitted at the same table is that of the 
Chinese Restaurant Process. 

Theorem 1.5. The random partition U obtained from the Chinese 
restaurant process (8) is a Poisson-Dirichlet random partition with 
parameters (0, 6) . In particular, IT is exchangeable. Moreover, the 



start new block: 



with probability —ra 



(8) 



join block of size m: with probability 
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size-biased ordering of the asymptotic block frequencies is the one 
given by the stick-breaking order (7). 

Proof. The proof is a simple (and quite beautiful) application of 
Polya's urn theorem. In Polya's urn, we start with one red ball and 
a number 6 of black balls. At each step, we choose one of the balls 
uniformly at random in the urn, and put it back in the urn along 
with one of the same colour. Polya's classical result says that the 
asymptotic proportion of red balls converges to a Beta(l, 0) random 
variable. Note also that this urn model may also be formally defined 
even when 9 is not an integer, and the result stays true in this case. 

Now, coming back to the Chinese Restaurant process, consider the 
block containing 1. Imagine that to each 1 < i < n is associated a 
ball in an urn, and that this ball is red if i ~ 1, and black otherwise, 
say. Note that, by construction, if at stage n, Bi contains r > 1 inte- 
gers, then as the new integer n + 1 is added to the partition, it joins 
Bi with probability r/{n + 9) and does not with the complementary 
probability. Assigning the colour red to Bi and black otherwise, this 
is the same as thinking that there are r red balls in the urn, and 
n — r + 6 black balls, and that we pick one of the balls at random 
and put it back along with one of the same colour (whether or not 
this is to join one of the existing blocks or to create a new one!) 
Initially (for n = 1), the urn contains 1 red ball and 9 black balls. 
Thus the proportion of red balls in the urn, X„(l)/n, satisfies: 

^n(l) ... 

— — > Wi, a.s. 

where Wi is a Beta(l,^) random variable. (This result is usually 
more familiar in the case where 9 = 1, in which case Wi is simply a 
uniform random variable). 

Now, observe that the stick breaking construction property is in 
fact a consequence of the Chinese restaurant process construction 
(8). Let ii = 1 and let i2 be the first i such that i is not in the 
same block as 1. If we ignore the block Bi containing 1, and look 
at the next block B2 (which contains 12), it is easy to see by the 
same Polya urn argument that the asymptotic fraction of integers 
i E B2 among those that are not in Bi, is a random variable W2 with 
the Beta(l,0) distribution. Hence I-B2I = (1 ~ Pi)W2- Arguing by 
induction as above, one obtains that the blocks (-61,-62, • • •); fisted 
in order of appearance, satisfy the strick breaking construction (7). 
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It remains to show exchangeability of the partition, but this is a 
consequence of the fact that, in Polya's urn, given the hmiting pro- 
portion W of red balls, the urn can be realised as an i.i.d. coin-tossing 
with heads probability W. It is easy to see from this observation that 
we get exchangeability. □ 

As a consequence of this remarkable construction, there is an exact 
expression for the probability distribution of n„. As it turns out, 
this formula will be quite useful for us. It is known (for reasons that 
will become clear in the next chapter) as Ewens' sampling formula. 

Theorem 1.6. Let vr be any given partition of [n], whose block size 
are 77,1, . . . , n^. 

^'"--'^'= OT...(r+n-i) n<"--')' 

Proof. This formula is obvious by induction on n from the Chinese 
restaurant process construction. It could also be computed directly 
through some tedious integral computations ("Beta-Gamma" alge- 
bra). □ 

1.3.2 Case = 0. 

Let < a < 1 and let 9 = 0. 

Definition 1.4. (Stick-breaking construction, 9 = 0). The Poisson- 
Dirichlet random variable with parameters (q, 0) is the random par- 
tition obtained from the stick breaking construction, where at the i^^ 
step, the piece to be cut off from the stick has distribution Wi ~ 
Beta(l — a,ia). That is, 

Pi = Wi, 

Pn+l = (1 - Pi - ... - Pn)Wn, (9) 

There is also a "Chinese restaurant process" construction in this 
case. The modification is as follows. If n„ has k blocks of size 
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ni, . . . ,nfc, n„+i is obtained by performing the fohowing operation 
on n + 1: 

{start new block: with probability — 

join block of size m: with probability 

It can be shown, using urn techniques for instance, that this con- 
struction yields the same partition as the paintbox partition associ- 
ated with the stick breaking process (9). 

As a result of this construction, Ewens' sampling formula can also 
be generalised to this setting, and becomes: 

p(n„ = vr) = ""^l^t'f' 11(1 - ") • • • K - «) (11) 

i=l 

where vr is any given partition of [n] into blocks of sizes ni, . . . , n^. 
1.3.3 A Poisson construction 

At this stage, we have seen essentially two constructions of a Poisson- 
Dirichlet random variable with = and < a < 1. The first one 
is based on the stick-breaking scheme, and the other on the Chinese 
Restaurant Process. Here we discuss a third construction which will 
come in very handy at several places in these notes, and which is 
based on a Poisson process. More precisely, let < a < 1 and let 
M denote the points of a Poisson random measure on (0, oo) with 
intensity ^{dx) = x~°'~^dx: 



M{dx) = ^5yM^ 



i>l 



In the above, we assume that the Yi are ranked in decreasing order, 
i.e., Yi is the largest point of M., Y2 the second largest, and so on. 
This is possible because a.s. M has only a finite number of points 
in (e, 00) (since q > 0). It also turns out that, almost surely, 



00 

E 



Yi < 00. (12) 



Indeed, observe that 
E 



00 \ 

^Yi^{Y,<i}j = xfi{dx) < 00 
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and so ^l{y,<i} < oo almost surely. Since there are only a 

finite number of terms outside of (0,1), this proves (12). We may 
now state the theorem we have in mind: 

Theorem 1.7. For all n > 1, let Pn = ^n/Ei^i^i- ^^^^ 
distribution of {Pn,n > 1) is that of a Poisson-Dirichlet random 
variable with parameters a and 9 = 0. 

The proof is somewhat technical (being based on explicit density 
calculations) and we do not include it in these notes. However we 
refer the reader to the paper of Perman, Pitman and Yor [130] where 
this result is proved, and to section 4.1 of Pitman's notes [133] which 
contains some elements of the proof. 

We also mention that there exists a similar construction in the case 
a = and 9 > 0. The corresponding intensity of the Poisson point 
process M should then be chosen as p{dx) = 9x~^e~^dx, which was 
Kingman's original definition of the Poisson-Dirichlet distribution 
[105]. See also section 4.11 in [9] and Theorem 3.12 in [133], where 
the credit is given to Ferguson [83] for this result. 

1.4 Some examples 

As an illustration of the usefulness of the Poisson-Dirichlet distri- 
bution, we give two classical examples of situations in which they 
arise, which are on the one hand, the cycle decomposition of random 
permutations, and on the other hand, the factorization into primes 
of a "random" large integer. A great source of information for these 
two examples is [9, Chapter 1], where much more is discussed. In 
the next chapter, we will focus (at length) in another incarnation of 
this partition, which is that of population genetics via Kingman's 
coalescent. In Chapter 6 we will encounter yet another one, which 
is within the physics of spin glasses. 

1.4.1 Random permutations. 

Let Sn be the set of permutations of 5 = {1, . . . , n}. If o" E 5„, there 
is a natural action of a onto the set S, which partitions S into orbits. 
This partition is called the cycle decomposition of a. For instance, 
if 

_/l 234567\ 
^~V3241756y 
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then the cycle decomposition of a is 

(J = (1 3 4)(2)(5 7 6). (13) 

This simply means that 1 is mapped into 3, 3 into 4 and 4 back into 
1, and so on for the other cycles. Cycles are the basic building blocks 
of permutations, much as primes are the basic building blocks of in- 
tegers. This decomposition is unique, up to order of course. If we 
further ask the cycles to be ordered by increasing least elements (as 
above), then this representation is unique. Let o" be a randomly cho- 
sen permutation (i.e., chosen uniformly at random). The following 
result describes the limiting behaviour of the cycle decomposition 
of a. Let L(") = (Li, L2, . . . , L^) denote the cycle lengths of a, or- 
dered by their least elements, and let X^") = (Li/n, . . . ,Lk/n) be 
the normalized vector, which tiles the unit interval (0, 1). 

Theorem 1.8. There is the following convergence in distribution: 

where (Pi, . . . , ) are the asymptotic frequencies of a PD{0, 1) random 
variable in size-biased order. 

(Naturally the convergence in distribution is with respect to the 
topology on Sq defined earlier, i.e., pointwise convergence of positive 
mass entries: in fact, this convergence also holds for the restriction 
of the ii metric). 

Proof. There is a very simple proof that this result holds true. The 
proof relies on a construction due to Feller, which shows that the 
stick-breaking property holds even at the discrete level. The cycle 
decomposition of a can be realised as follows. Start with the cycle 
containing 1. At this stage, the permutation looks like 

a = {l 

and we must choose what symbol to put next. This could be any 
number of {2, . . . , n} or the symbol which closes the cycle ")" . Thus 
there are n possibilities at this stage, and the Feller construction is 
to choose among all those uniformly at random. Say that our choice 
leads us to: 

0- = (1 5 
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At this stage, we must choose among a number of possible symbols: 
every number except 1 and 5 are allowed, and we are allowed to 
close the cycle. Again, one must choose uniformly among those 
possibilities, and do so until one eventually chooses to close the cycle. 
Say that this happens at the fourth step: 

cj = (1 5 2) 

At this point, to pursue the construction we open a new cycle with 
the smallest unused number, in this case 3. Thus the permutation 
looks like: 

0- = (1 5 2)(3 

At each stage, we choose uniformly among all legal options, which are 
to close the current cycle or to put a number which doesn't appear 
in the previous list. 

Then it is obvious that the resulting permutation is random: for 
instance, if n = 7, and ctq = (1 3 4) (2) (5 7 6), then 

X 11 111 

,(. = .„, = -.-.,...-.- = - 

because at the k^^ step of the construction, exactly k numbers have 
already been written and thus there n — k + 1 symbols available 
(the +1 is for closing the cycle). Thus the Feller construction gives 
us a way to generate random permutations (which is an extremely 
convenient algorithm from a practical point of view, too). 

Now, note that Li, which is the length of the first cycle, has a 
distribution which is uniform over {1, . . . ,n}. Indeed, 1 < k < n, 
the probability that L = k is the probability that the algorithm 
chooses among n — 1 options out of n, and then n — 2 out of n — 1, 
etc., until finally at the /c*^ step the algorithm chooses to close the 
cycle (1 option out of n — A: + 1). Cancelling terms, we get: 

n— 1 n — 2 n — k + 1 1 

F{L = k) -- 



n n—1 n—k+2n—k+l 
1 



One sees that, similarly, given Li and {Li < n}, L2 is uniform on 
{1, . . . , n — Li}, by construction. More generally, given (Li, . . . , Lfc) 
and given that {Li + . . . , +Lfc < n}, we have: 

Lfe+i ~ Uniform on {1, . . . , n — Li — . . . — L^} (14) 
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which is exactly the analogue of (7). From this one deduces Theorem 
1.8 easily. □ 

1.4.2 Prime number factorisation. 

Let n > 1 be a large integer, and let N be uniformly distributed on 
{1, . . . ,n}. What is the prime factorisation of A^? Recall that one 
may write 

AT = (15) 

pev 

where V is the set of prime numbers and Op are nonnegative integers, 
and that this decomposition is unique. To transfer to the language 
of partitions, where we want to add the parts rather than multiply 
them, we take the logarithms and define: 

Li = logpi, . . . ,Lfc = logpfc- 

Here the pi are such that Op > in (15), and each prime p appears 
ap times in this list. We further assume that Li > . . . > Lk- 

Theorem 1.9. Lei X^") = (Li/ logn, . . . , L^/ log n). Then we have 
convergence in the sense of finite- dimensional distributions: 

xW^(A,...) 

where {Pi, . . .) is the decreasing rearrangement of the asymptotic fre- 
quencies of a PD{0, 1) random variable. 

In particular, large prime factors appear each with multiplicity 1 
with high probability as n ^ oo, since the coordinates of a PD{0, 1) 
random variable are pairwise distinct almost surely. See (1.49) in [9], 
which credits Billingsley [33] for this result, and [62] for a different 
proof using size-biased ordering. 

1.4.3 Brownian excursions. 

Let {Bt,t > 0) be a standard Brownian motion, and consider the 
random partition obtained by performing the paintbox construction 
to the tiling of (0, 1) defined by Z n (0, 1), where 

Z = {t>0 : Bt = 0} 

is the zero-set of B. 

Let (Pi, . . . , ) be the size of the tiles in size-biased order. 
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Figure 2: The tiling of (0, 1) generated by the zeros of Brownian 
motion. 

Theorem 1.10. (Pi, . . .) has the distribution of the asymptotic fre- 
quencies of a PD{^,0) random variable. 

Proof. The proof is not comphcated but requires knowledge of ex- 
cursion theory, which at this level we want to avoid, since this is 
only supposed to be an illustrating example. The main step is to 
observe that at the inverse local time ti = inf{t > : = 1}, the 
excursions lengths are precisely a Poisson point process with inten- 
sity p{dx) = x~"^~^ with a = 1/2. This is an immediate consequence 
Ito's excursion theory for Brownian motion and of the fact that Ito's 
measure v gives mass 

u{e : |e| e dt) = Ct-^'^ 

for some C > 0. From this and Theorem 1.7, it follows that the 
normalized excursion lengths at time ti have the PD(^,0) distribu- 
tion. One has to work slightly harder to get this at time 1 rather 
than at time ri. More details and references can be found in [134], 
together with a wealth of other properties of Poisson-Dirichlet dis- 
tributions. □ 
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1.5 Tauberian theory of random partitions 
1.5.1 Some general theory 

Let n be an exchangeable random partition with ranked frequencies 
(Pi, . . .), which we assume has no dust almost surely. In applications 
to population genetics, we will often be interested in exact asymp- 
totics of the following quantities: 

1. Kn, which is the number of blocks of n„ (the restriction of 11 



2. Kn,r, which is the number of blocks of size r, 1 < r < n. 

Obtaining asymptotics for is usually easier than for K^^r^ for 
instance due to monotonicity in n. But there is a very nice result 
which relates in a surprisingly precise fashion the asymptotics of K^^r 
(for any fixed r > 1, as n — > oo) to those of Kn. This may seem 
surprising at first, but we stress that this property is of course a con- 
sequence of the exchangeability of IT and Kingman's representation. 
The asymptotic behaviour of these two quantities is further tied to 
another quantity, which is that of the asymptotic speed of decay of 
the frequencies towards 0. The right tool for proving these results is 
a variation of Tauberian theorems, which take a particularly elegant 
form in this context. The main result of this section (Theorem 1.11) 
is taken from [91], which also contains several other very nice results. 

Theorem 1.11. Let < a < 1. There is equivalence between the 
following properties: 

(i) Pj ^ Zj~" almost surely as j ^ oo, for some Z > 0. 

(a) Kn ~ Dn°' almost surely as n ^ oo, for some D > 0. 

Furthermore, when this happens, Z and D are related through 



to [n]). 




and we have: 

(Hi) For any r >1, K, 



°'*"^.|^ — ^ Dn°' as n 



oo. 
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The result of [91] is actually more general, and is valid if one 
replaces D hy a slowly varying sequence in- Recall that a function 
/ is slowly varying near oo if for every A > 0, 

lim ^ = 1. (16) 

x^oo j[X) 

The prototypical example of a slowly varying function is the loga- 
rithm function. Any function / which may be written as f{x) = 
x'^i^x), where i{x) is slowly varying, is said to have regular varia- 
tion with index a. A sequence Cn is regularly varying with index a 
if there exists f{x) such that c„ = /(n) and / is regularly varying 
with index a, near oo. 

Proof, (sketch) The main idea is to start from Kingman's represen- 
tation theorem, and to imagine that the Pj are given, and then see 
n„ as the partition generated by sampling with replacement from 
(Pj)- Thus in this proof, we work conditionally on (Pj), and all 
expectations are (implicitly) conditional on these frequencies. 

Rather than looking at the partition obtained after n samples, it 
is more convenient to look at it after N{n) samples, where N{n) is a 
Poisson random variable with mean n. The superposition property 
of Poisson random variables implies that one can imagine that each 
block j with frequencies Pj is discovered (i.e., sampled) at rate Pj. 
Since we assume that there is no dust, this means X]j>i Pj — 1 
almost surely, and hence the total rate of discoveries is indeed 1. Let 
K{t) be the total number of blocks of the partition at time t, and 
let Kr{t) be the total number of blocks of size r at time t. Standard 
Poissonization arguments imply: 



and 



Kr{n) 



1, a.s. 



1, a.s. 



That is, we may as well look for the asymptotics in continuous Pois- 
son time rather than in discrete time. For this we will use the fol- 
lowing law of large numbers, proved in Proposition 2 of [91]. 



Lemma 1.1. For arbitrary [Pj] 

m 



a.s. (17) 
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Proof. The proof is fairly simple and we reproduce the arguments of 
[91]. Recall that we work conditionally on (Pj), so all the expecta- 
tions in the proof of this lemma are (implicitly) conditional on these 
frequencies. Note first that if <I>(t) = E{K{t)), then 



m 



and similarly if V{t) = vai K{t), we have (since K(t) is the sum of 
independent Bernoulli variables with parameter 1 — e 



-Pit\ 



(1 



-Pit 



E 

$(2i) 



But note that ^ is convex: indeed, by stationarity of Poisson pro- 
cesses, the expected number of blocks discovered during (i, t + s] is 
<5(s), but some of those blocks discovered during the interval {t,t-\-s] 
were in fact already known prior to t, and hence don't count in 
K{t + s). Thus 

v{t) < m 

and by Chebyshev's inequality: 



K{t) 



1 



>e < 



e2<5(t)' 



Taking a subsequence tm such that tv? < ^>(tm) < (m -|- 1)^ (which 
is always possible), we find: 

K{tm) 



1 



> 



< 



Hence by the Borel-Cantelli lemma, K{tm)/^{tm) — > 1 almost 
surely as m — > oo. Using monotonicity of both <l>(t) and K{t), we 
deduce 

K{t^+i) ^ ^ K{tm+l) 

Since $(tm-i-i)/$(im) this means both the left-hand side and 

the right-hand side of the inequality tend to 1 almost surely as m — > 
oo. Thus (17) follows. □ 
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Once we know Lemma 1.1, note that 

EK{t) =: = / (1 - e-*^)i/((ix) 
Jo 

where 

^{dx) := 6p^ (dx). 
Fubini's theorem imphes: 

/•oo 

EK{t) = t / e"*^P(x)dx (18) 
Jo 

where z/(x) = z^([x,cxo)), so the equivalence between (i) and (ii) fol- 
lows from classical Tauberian theory for the monotone density /^(x), 
together with (17). That this further implies (iii), is a consequence 
of the fact that 

EKrit) = % C x'^e-'^'vidx) 
r^- Jo 

= -J e~'-ur{dx), (19) 
r^- Jo 

where we have denoted 

Vr{dx)=Y,Pj^pM^)- 

Integrating by parts gives us: 

r-X 

Ur{[Q,x\) = —X v{x)+r I u!~^h'{x)dx. 

Jo 

Thus, by application of Karamata's theorem [82] (Theorem 1, Chap- 
ter 9, Section 8), we get that the measure Vj- is also regularly varying, 
with index r — a: assuming that v{x) ~ £(x)x~" as x — > 0, 

Ur{[^,x]) ^x'^-"^(x), 

r — a 

by application of a Tauberian theorem to (19), we get that: 

Mt) ~ "^(^ -") t"^(t). (20) 
r! 
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A refinement of the method used in Lemma 1.1 shows that 

Kr{n) 

E(K.(n)|(P,),>i)"^' ^^^^ 
in that case. Putting together (20) and (21), we obtain (iii). □ 

As an aside, note that (as pointed out in [91]), (21) needs not hold 
for general (Pj), as it might not even be the case that K{Kr{n)) — > oo. 

1.5.2 Example 

As a prototypical example of a partition IT which verifies the assump- 
tions of Theorem 1.11, we have the Poisson-Dirichlet(a, 0) partition. 

Theorem 1.12. Let H be a PD{a,0) random partition. Then there 
exists a random variable S such that 

almost surely. Moreover S has the Mittag-Leffer distribution: 
P(5 edx) = — V ^— r{ak + 1)5^=-^ sin(7raA;). 

k=l 

Proof. We start by showing that n" is the right order of magnitude 
for Kn. First, we remark that the expectation Un = E(i^„) satisfies, 
by the Chinese restaurant process construction of IT, that 



Un+l - «n = E 



n n 



This implies, using the formula V{x + 1) = xV{x) (for x > 0): 

Un+l = linll + -) 

n 

= (l + -)(l + ^)...(l + ^)ni 
n n — \ 1 

_ r(n+l+a) 

~ r(n+i)r(i+a)' 

Thus, using the asymptotics V(x + a) ~ x'^r(x), 

r(n + q) n" 



r(n)r(l + a) r(l + a) 
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(This appears on p. 69 of [133], but using a more combinatorial ap- 
proach) . 

This tells us the order of magnitude for Kn- To conclude to the 
almost sure behaviour, a martingale argument is needed (note that 
we may not apply Lemma 1.1 as this result is only conditional on 
the frequencies {Pj)j>i of 11.) This is outlined in Theorem 3.8 of 
[133]. ~ □ 

Later (see, e.g., Theorem 4.2), we will see other applications of 
this Tauberian theory to a concrete example arising in population 
genetics. 
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2 Kingman's coalescent 

In this chapter, we introduce Kingman's coalescent and study its 
first properties. This leads us to the notion of coming down from 
infinity, which is a "big bang" like phenomenon whereby a partition 
consisting of pure dust coagulates instantly into solid fragments. We 
show the relevance of Kingman's coalescent to population models by 
studying its relationship to the Moran model and the Wright-Fisher 
diffusion and state a result of universality known as Mohle's lemma. 
We derive some theoretical and practical implications of this relation- 
ship, such as the notion of duality between Kingman's coalescent and 
the Wright-Fisher diffusion. We then show that the Poisson-Dirichlet 
distribution describes the allelic partition associated with Kingman's 
coalescent. As a consequence, Ewens's sampling formula describes 
the typical genetic variation (or polymorphism in biological terms) 
of a sample of a population. This result is one of the cornerstones of 
mathematical population genetics, and we show a few applications. 

2.1 Definition and construction 
2.1.1 Definition 

Kingman's coalescent is perhaps the simplest stochastic process of 
coalescence. It is easier to define it as a process with values in V, 
although by Kingman's correspondence there is an equivalent version 
in Sq. Let n > 1. We start by defining a process (n",t > 0) with 
values in the space Vn of partitions of [n] = {1, . . . , n}. This process 
is defined by saying that: 

1. Initially IIq is the trivial partition in singletons. 

2. n" is a strong Markov process in continuous time, where the 
transition rates q{TT, vr') are as follow: they are positive if and 
only if it' is obtained from merging two blocks of vr, in which 
case q{TT, vr') = 1. 

To put it in words, 11" is a process which starts with a totally 
fragmented state, and which evolves with (binary) coalescences. The 
evolution may be described by saying that every pair of blocks merges 
at rate 1, independently of their size. Because of this last property, 
one may think of a block as a particle. Each pair of particles merges 
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at rate 1, regardless of any additional structure. When two parti- 
cles merge, the pair is replaced by a new particle which is indistin- 
guishable from any other particle. 11'^ is sometime referred to as 
Kingman's n-coalescent or simply an n-coalescent (the definition of 
Kingman's (infinite) coalescent is delayed to Proposition 2.1). 

Consistency. A trivial but important property of Kingman's n- 
coalescent is that of consistency: if we consider the natural restriction 
of n"' to partitions in Vm, where m < n, we obtain a new random 
process n™'". The claim is that the distribution of n*"'" exactly the 
law of Kingman's m-coalescent (and is thus independent of n). This 
is not a priori obvious, as the projection of a Markov process needs 
not even stay Markov. However, it is easy and elementary to verify 
the claim. 

One important consequence of this property is, by Kokmogrov's 
extension theorem, the following: 

Proposition 2.1. There exists a unique in law process {Ilt,t > 0) 
with values in V, such that the restriction of II to Vn is an n- 
coalescent. ijit^i ^ 0) is called Kingman's coalescent. 

To see how this follows from Kolmogorov's extension theorem, note 
that a partition vr of N may be regarded as a function from N into 
itself: it suffices to assign to every integer i the smallest integer in the 
same block of vr as i. Hence a coalescing partition process (H^, t > 0) 
may formally be viewed as a process indexed by N taking its values 
into E = The consistency property above guarantees that 

the cylinder restrictions (i.e., the finite-dimensional distributions) of 
this process are consistent, which in turn makes it possible to use 
Kolmogorov's extension theorem to yield Proposition 2.1. 

Quite apart from this "general abstract nonsense" , the consistency 
property also suggests a simple probabilistic construction of King- 
man's coalescent, which we now indicate. This construction is in the 
manner of graphical constructions for models such as the voter model 
(see, e.g., [115] or Theorem 5.3 in these notes), and serves as a model 
for the more sophisticated future constructions of particle systems 
based on Fleming- Viot processes. The idea is to label every block B 
of the partition Ii{t) by its lowest element. That is, we construct for 
every z > 1, a label process {Xt{i),t > 0), where Xt{i) = j means 
that at time t, the lowest element of the block containing i is equal 
to j. Thus Xt{i) has the properties that Xo(z) = i for every i > 1, 
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and Xt{i) can only jump downwards, at times of a coalescence event 
involving the block containing i. At each such event Xt{i) jumps 
to the lowest element j such that j ~n(t+) ^- The point is that 
{Xt{i),t > 0) can be constructed for every i > 1 simultaneously, as 
follows. For every i < j, let Tj.j be an exponential random variable. 
To define Xt{n), there is no problem in making the above informal 
description rigorous: indeed, to define Xt{n), it suffices to look at 
the exponential random variables associated with l<i<j<n, as 
the Tij with n < i < j cannot affect Xt{n). Thus there can never be 
any accumulation point of the Tij since there are only finitely many 
such variables to be considered. 

[More formally, let Ti = inf{Tjj,l < i < j < n}, and define 
recursively 

Tfc+i = inf{rij : I < i < j < n, Tij > Tk}. 

Thus (Ti, T2, . . .) is the sequence of times at which there is a potential 
coalescence. Let ik,jk be defined by = t,,,jj.. Define Xt{i) = i 
for all t < Ti. Inductively now, if A; > 1, and Xt{i) is defined for all 
1 < i < n, and all t < T^. Let / be the set of particles whose label 
changes at time T^: 

/ = {iG[n]:Xt(r,7)=Jfc}- 
Define Xt{i) = Xj,- (i) if i ^ / for all t G [Tfc, T^+i), and put Xt{i) = 

k 

ifc if i G /, for ah t G [Tk,Tk+i).] 

Once the label process {Xt{i),t > 0) is defined simultaneously for 
all i > 1, we can define a partition n(t) by putting: 

i ~n(t) j if and only if Xt{i) = Xt{j). (22) 

Moreover, it is obvious from the above description that the dynamics 
of (n(t), t > 0) restricted to Vn is that of an n-coalescent. Thus (22) 
is a realisation of Kingman's coalescent. Note that despite the la- 
belling process which seems to favour lower labels rather than upper 
labels, the partition Il{t) is, for every t > 0, exchangeable: this fol- 
lows from looking at the restriction of H to [n] for every n > 1 which 
contains the support of a permutation a with finite support. From 
the original description of an n-coalescent, it is plain that Unit) is 
invariant under the permutation a. Hence n(t) is exchangeable. 
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2.1.2 Coming down from infinity. 

We are now ready to describe what is one of Kingman's coalescent's 
most striking features, wliich is tliat it comes down from infinity. 
As we will see, this phenomenon states that, although initially the 
partition is only made of singletons, after any positive amount of 
time, the partition contains only a finite number of blocks almost 
surely, which (by exchangeability) must all have positive asymptotic 
frequency (in particular, there is no dust almost surely anymore, 
as otherwise the singletons would contribute an infinite number of 
blocks). Thus, let Nt denote the number of blocks of n(t). 



Theorem 2.1. Let E he the event that for all t > 0, Nt < oo. Then 
¥{E) = 1. 



In words, coalescence is so strong that all dust has coagulated into 
a finite number of solid blocks. We say that Kingman's coalescent 
comes down from infinity. This is a big-bang-like event, which is 
indeed reminiscent of models in astrophysics. 

Proof. The proof of this result is quite easy, but we prefer to first 
give an intuitive explanation for why the result holds true. Note 
that the time it takes to go from n blocks to n — 1 blocks is just an 
exponential random variable with rate n{n — l)/2. When n is large, 
this is approximately so we can expect the number of blocks 

to approximately solve the differential equation: 



(23) has a well-defined solution u{t) = 2/t, which is finite for all t > 
but infinite for t = 0. This explains why Nt is finite almost surely 
for all t > 0. in fact, one guesses from the ODE approximation: 



almost surely. This statement is correct indeed, but unfortunately 
it is tedious to make the ODE approximation rigorous. Instead, 
to show Theorem 2.1, we use the following simple argument. It is 
enough to show that, for every e > 0, there exists M > such that 




(23) 




t 







(24) 



Coalescent theory 



36 



¥{Nt > M) < e. For this, it suffices to look at the restrictions 11" of 
n to [n], and show that 

limsupP(Afj" > M) < e. (25) 

n— >oo 

Here we used the notation for the number of blocks of IT". For 
every n > 1, let En be an exponential random variable with rate 
n(n — l)/2. Then note that, by Markov's inequality: 



P(iVj" > M) 




The right-hand side of the above inequality is independent of n, and 
can be made as small as desired provided M is chosen large enough. 
Thus (25) follows. □ 



2.1.3 Aldous' construction 

We now provide two different constructions of Kingman's coalescent 
which have some interesting consequences. The first one is due to Al- 
dous (section 4.2 in [5]). Let {Uj)'^^ be a collection of i.i.d. uniform 
random variables on (0,1). Let Ej be a collection of independent 
exponential random variables with rate j{j — l)/2, and let 

oo 

Tj = ^ Ek <oo. 
k=j + l 

Define a function / : (0, 1) — > M by saying f(Uj) = tj for all j > 1, 
and f{u) = if ti is not one of the Uj^s. Define a tiling S{t) of (0, 1) by 
looking at the open connected components of {u £ (0, 1) : f{u) > t}. 
See figure 2.1.3 for an illustration. 

Theorem 2.2. {S{t),t > 0) has the distribution of the asymptotic 
frequencies of Kingman's coalescent. 
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Figure 3: Aldous' construction. The vertical sticks are located at 
uniform random points on (0,1). The stick at Uj has height Tj. 
These define a tiling of (0, 1) as shown in the picture. The tiles 
coalesce as t increases from to oo. 



Proof. We offer two different proofs, which are both instructive in 
their own ways. The first one is straightforward: in a first step, note 
that the transitions of S{t) are correct: when S{t) has n fragments, 
one has to wait an exponential amount of time with rate n(n — l)/2 
before the next coalescence occurs, and when it does, given S{t), the 
pair of blocks which coalesces is uniformly chosen. (This follows from 
the fact that, given S{t), their linear order is uniform). Once this 
has been observed, the second step is to argue that the asymptotic 
frequencies of Kingman's coalescent forms a Feller process with an 
entrance law given by the "pure dust" state 5(0) = (1,0, . . .) S Sq. 
(Naturally, this Feller property is meant in the sense of the usual 
topology on Sq, i.e., not the restriction of the £^ metric, but that 
determined by pointwise convergence of the non-dust entries.) This 
argumentation can be found for instance in [5, Appendix 10.5]. Since 
it is obvious that S{t) — > (1,0,...) in that topology as t — > 0, we 
obtain the claim that S{t) has the distribution of the asymptotic 
frequencies of Kingman's coalescent. 

The second proof if quite different, and less straightforward, but 
more instructive. Start with the observation that, for the finite n- 
coalescent, the set of successive states visited by the process, say 
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(n„, n„_i, . . . , Hi) (where for each 1 < i < n, Ilj has exactly i 
blocks), is independent from the holding times {Hn, Hn-i, ■ ■ ■ ,H2) 
(this is, of course, not true of a general Markov chain, but holds 
here because the holding time Hf^ is an exponential random variable 
with rate k{k — l)/2 independent from 11^.) Letting n — > oo and 
considering these two processes backward in time, we obtain that for 
Kingman's coalescent the reverse chain (111,112, . . .) is independent 
from the holding times {H2, H3, . . .). It is obvious in the construction 
of S{t) that the holding times {H2, . . .) have the correct distribution, 
hence it suffices to show that (Hi, . . . , ) has the correct distribution, 
where Hk is the random partition generated from S{Tk) by sampling 
at uniform random variables (Uj) independent of the time k > 1 
(here is a time at which S{t) has k blocks). 

To this end, we introduce the notion of rooted segments. A rooted 
segment on k points ii, . . . , ifc is one of the possible k\ linear orderings 
of these k points. We think of them as being oriented from left to 
right, the leftmost point being the root of the segment. If n > 1 and 
1 < k < n, consider the set 7^n,fc of all rooted segments on {1, . . . ,n} 
with exactly k distinct connected components (the order of these k 
segments is irrelevant). We call such an element a broken rooted 
segment. 

Lemma 2.1. The random partition associated with a uniform ele- 
ment of IZn.k has the same distribution as 11^, where (n^)n>fc>i 
the set of successive states visited by Kingman's n-coalescent. 

Proof. The proof is modeled after [24], but goes back to at least 
Kin gman [107]. It is obvious that the partition associated with ^^n, 
a random element of TZn.n-, has the same structure as 11^ (as both 
these are singletons almost surely). Now, let < n and let H be a 
randomly chosen element of 7^n,fc> and let r! be obtained from H by 
merging a random pair of clusters and choosing one of the two orders 
for the merged linear segment at random. Then we claim that H' is 
uniform on lZn,k-i- Indeed, if ^ ^ ^' denotes the relation that ^' can 
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be obtained from ^ by merging two parts, we get: 

1 1 2 



= E 



1 1 



-Yy|{eG7^„,fe:e^Ol• 



The point is that, given ^' G there are exactly n — /c + 1 

ways to cut a hnk from it and obtained a ^ E '^n,fc such that ^ . 
Note that there can be no repeat in this construction, and hence, 
|{^ G TZn,k '■ C ^'?'}| = 11 — k + 1, which does not depend on In 
particular, 

(- = = Jrtt' , (26) 



k{k-l)\nn,k\ 



and thus H' is uniform on TZn k-i- 



□ 



4;- 



<2) 



Figure 4: Cutting a rooted random segment. 

The lemma has the following consequence. It is easy to see that 
a random element of 7^n,fc may be obtained by choosing a random 
rooted segment on [n], and breaking it at A; — 1 uniformly chosen 
links. Rescaling the interval [0, n\ to the interval (0, 1) and letting 
n ^ 00, it follows from this argument that 11^, which is the infinite 
partition of Kingman's coalescent when it has k blocks, has the same 
distribution as the unit interval cut at A; — 1 uniform random points. 
This finishes the proof of Theorem 2.2. □ 
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This theorem, and the discrete argument given in the second proof, 
have a number of useful consequences, which we now detail. 

Corollary 2.1. Let be the first time that Kingman's coalescent 
has k blocks, and let S{Tf;) denote the asymptotic frequencies at this 
time, ranked in nonincreasing order. Then S{Tii.) is distributed uni- 
formly over the {k — 1)- dimensional simplex: 




We also emphasize that the discrete argument given in the second 
proof of Theorem 2.2, has the following nontrivial consequence for 
the time-reversal of Kingman's n-coalescent: it can be constructed as 
a Markov chain with "nice" , i.e., explicit, transitions. Let (Hi, . . . , H„) 
be a process such that H^. G TZn^k for all 1 < A; < n, and defined 
as follows: Hi is a uniform rooted segment on [n]. Given Hj with 
1 < z < n — 1, define Hj+i by cutting a randomly chosen link from 
Hj. (See Figure 4). 

Corollary 2.2. The time-reversal ofE, that is, (H„,H„_i,... ,Hi), 
has the same distribution as Kingman's n-coalescent in discrete time. 

As a further consequence of this link, we get an interesting formula 
for the probability distribution of Kingman's coalescent: 

Corollary 2.3. Let 1 < k < n. Then for any partition of [n] with 
exactly k blocks, say vr = {Bi, B2, . . . , Bk), we have: 

^ i=l 

Proof. The number of elements in 7^n,fc is easily seen to be 

Indeed it suffices to choose k — 1 links to break out of n — 1, after 
having chosen one of n! rooted segments on [n]. Ignoring the order 
of the clusters gives us (28). Since the same partition is obtained by 
permuting the elements in a cluster of the broken rooted segment, 
we obtain immediately (27). □ 
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It is possible to prove (27) directly on Kingman's coalescent by 
induction, which is the one chosen by Kingman [107] (see also Propo- 
sition 2.1 of Bertoin [28]). However this approach requires to guess 
the formula beforehand, which is really not that obvious! Induction 
works, but doesn't explain at all why such a formula should hold 
true. In fact, miraculous cancellations take place and (27) may seem 
quite mysterious. Fortunately, the connection with rooted segments 
explains why this formula holds. 

Alternatively, we note that, given Corollary 2.1, (27) can be ob- 
tained by conditioning on the frequencies of 11^ , which are obtained 
by breaking the unit interval (0, 1), at k — 1 uniform independent ran- 
dom points, and then sampling from this partition as in Kingman's 
representation theorem. This has a Dirichlet density with k — 1 pa- 
rameters, so such integrals can be computed explicitly, and one finds 
(27). 

Later, we will describe a construction of Kingman's coalescent 
in terms of a Brownian excursion (or, equivalently, of a Brownian 
continuum random tree), which is seemingly quite different. Both 
these constructions can be used to study some of the fine properties 
of Kingman's coalescent: see [5] and [16]. 

2.2 The genealogy of populations 

We now approach a theme which is a main motivation for the study of 
coalescence. We will see how, in a variety of simple population mod- 
els, the genealogy of a sample from that population can be approxi- 
mated by Kingman's coalescent. This will usually be formalized by 
taking a scaling limit as the population size N tends to infinity, while 
the sample size n is fixed but arbitrarily large. A striking feature 
of these results is that the limiting process, Kingman's coalescent, is 
to some degree universal, as shown in the upcoming Theorem 2.5. 
That is, its occurrence is little sensitive to the microscopic details of 
the underlying probability model, much like Brownian motion is a 
universal scaling limit of random walks, or SLE is a universal scaling 
limit of a variety of critical planar models from statistical physics. 

However, there are a number of important assumptions that must 
be made in order for this approximation to work. Loosely speaking, 
those are usually of the following kind: 

(1) Population of constant size, and individuals typically have few 
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offsprings. 

(2) Population is well- mixed (or mean- field) : everybody is liable 
to interact with anybody. 

(3) No selection acts on the population. 

We will see how each of these assumptions is implemented in a 
model. For instance, a typical assumption corresponding to (1) is 
that the population size is constant and the number of offsprings of 
a random individual has finite variance. Changing other parameters 
of the model (e.g., such as overlapping generations or not) will not 
make any macroscopic difference, but changing any of those 3 points 
will usually affect the genealogy in essential ways. Indeed, much of 
the rest of the volume is devoted to studying coalescent processes 
in which some or all of those assumptions are invalidated. This will 
lead us in general to coalescent with multiple mergers, taking place 
in some physical space modeled by a graph. But we are jumping 
ahead of ourselves, and for now we first expose the basic theory of 
Kingman's coalescent. 

2.2.1 A word of vocabulary 

Before we explain the Moran model in next paragraph, we briefly 
explain a few notions from biology. From the point of view of ap- 
plications, the samples concern not the individuals themselves, but 
usually some of their genetic material. Suppose one is interested 
in some specific gene (that is to say, a piece of DNA which codes 
for a certain protein, to simplify). Suppose we sample n individuals 
from a population of size N ^ n. We will be interested in describ- 
ing the genetic variation in this sample corresponding to this gene, 
that is, in quantifying how much diversity there is in the sample at 
this gene. Indeed, what typically happens is that several individu- 
als share the exact same gene and others have different variations. 
Different versions of a same gene are called alleles. Here we will im- 
plicitly assume that all alleles are selectively equivalent, i.e., natural 
selection doesn't favour a particular kind of allele (or rather, the 
individual which carries that allele). 

To understand what we can expect of this variation, it turns out 
that the relevant thing to analyse is the ancestry of the genes we 
sampled, and, more precisely, the genealogical relationships between 
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these genes. To explain why this is so, imagine that all genes are very 
closely related, say our sample comes from members of one family. 
Then we expect little variation as there is a common ancestor to 
these individuals going back not too far away in the past. Genes 
may have evolved from this ancestor, due to mutations, but since 
this ancestor is recent, we can expect these changes to be not very 
many. On the contrary, if our sample comes from individuals that 
are very distantly related (perhaps coming from different countries), 
then we expect a much larger variation. 

Ancestral partition. It thus makes sense to desire to analyse 
the genealogical tree of our sample. We usually do so by observing 
the ancestral partition process. Suppose that we have a certain pop- 
ulation model of constant size which is defined on some interval 
of time I = [— r, 0] where T will usually be oo. Then we can sample 
without replacement n individuals from the population at time 0, 
say xi, . . . ,Xn, with n < N, and consider the random partition IT" 
such that i ~ j if and only if and xj share the same ancestor at 
time —t. The process (H", < i < T) is then a coalescent process. It 
is very important to realise that the direction of time for the coales- 
cent process is the opposite of the direction of time for the "natural" 
evolution of the population. 

Recalling that we only want the ancestry of the gene we are look- 
ing at, rather than that of the individual which carries it, simplifies 
greatly matters. Indeed, in diploid populations like humans (i.e., 
populations whose genome is made of a number of pairs of homolo- 
gous chromosomes, 23 for humans), each gene comes from a single 
parent, as opposed to individuals, who come from two parents. Thus 
in our sample, we have a number of n genes, and we can go back 
one generation in the past and ask who were the "parents" (i.e., the 
parent gene) of each of those n genes. It may be that some of these 
genes share the same parent, e.g., in the case of siblings. In that case, 
the ancestral lineages corresponding to these genes have coalesced. 
Eventually, if we go far enough back into the past, all lineages from 
our initial n genes, will have coalesced to a most recent common 
ancestor, which we can call the ancestral Eve of our sample. Note 
that if we sample n individuals from a diploid population such as hu- 
mans, we actually have 2n genes each with their genealogical lineage. 
Thus from our point of view, there won't be any difference between 
haploid and diploid populations, except that the population size is 
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Figure 5: Moran model and associated ancestral partition process. 
An arrow indicates a replacement, the direction shows where the 
lineage comes from. Here N = 7 and the sample consists of indi- 
viduals 1,3,4,5,6. At time t, Ut = {1, 3, 5}, {4, 6}, while at time T, 
nT = {1,3,4,5,6}. 

in effect doubled. From now on, we will thus make no distinction 
between a gene and an individual. 

2.2.2 The Moran and the Wright-Fisher models 

The Moran model is perhaps the simplest model which satisfies as- 
sumption (1), (2) and (3). In it, there are a constant number of 
individuals in the population, A^. Time is continuous, and every in- 
dividual lives an exponential amount of time with rate 1. When an 
individual dies, it is simultaneously replaced by an offspring of an- 
other individual in the population, which is uniformly chosen from 
the population. This keeps the population size constant equal to N. 
This model is defined for all t G M. See the accompanying Figure 
5 for an illustration. Note that all three assumptions are satisfied 
here, so it is no surprise that we have: 

Theorem 2.3. Let n > 1 be fixed, and let xi, . . . ,Xn ben individuals 
sampled without replacement from the population at time t = 0. For 
every N > n, let H^'" be the ancestral partition obtained by declaring 
i ^ j if and only if Xi and xj have a common ancestor at time —t. 
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Then, speeding up time by {N — l)/2, we find: 




t > 0) is an n- coalescent. 



Proof. The model may for instance be constructed by considering N 
independent stationary Poisson processes with rate 1 cx) < 

t < oo)^]^. Each time Zt{i) rings, we declare that the i^^ indi- 
vidual in the population dies, and is replaced by an offspring from 
a randomly chosen individual in the rest of the population. Since 
the time-reversal of a stationary Poisson process is still a stationary 
Poisson process, we see that while there are k < n lineages that 
have not coalesced by time —t, each of them experiences what was a 
death-and-substitution in the opposite direction of time, with rate 1. 
At any such event, the corresponding lineage jumps to a randomly 
chosen other individual. With probability {k — 1)/{N — 1), this in- 
dividual is one of the other k — 1 lineages, in which case there is a 
coalescence. Thus the total rate at which there is a coalescence is 
k{k — l)/(iV — 1). Hence speeding time by {N — l)/2 gives us a total 
coalescence rate of k{k — l)/2, as it should be for an n-coalescent 
with k blocks. □ 

In the Wright-Fisher model, the situation is similar, but the model 
is slightly different. The main difference is that generations are dis- 
crete and non-overlapping (as opposed to the Moran model, where 
different generations overlap). To describe this model, assume that 
the population at time t G Z is made up of individuals xi, . . . ,X]\f. 
The population at time t + 1 may be defined as yi, . . . ,y]y, where 
for each 1 < i < A^, the parent of yi is randomly chosen among 
xi, . . . ,X]y. Again, the model may be constructed for all t G Z. As 
above, all three conditions are intuitively satisfied, so we expect to 
get Kingman's coalescent as an approximation of the genealogy of a 
sample. 

Theorem 2.4. Fix n > 1, and let Ilf''" denote the ancestral parti- 
tion at time t of n randomly chosen individuals from the population 
at time t = 0. That is, i ^ j if and only if Xi and Xj share the same 
ancestor at time —t. Then as N ^ oo, and keeping n fixed, speeding 
up time by a factor N : 



(<f,t>0) 



d (nr,t>o) 



Coalescent theory 



46 



where — >ii indicates convergence in distribution under the Skorokhod 
topology of 3 {[0,00), Vn), o-nd (11", t > 0) is Kingman's n-coalescent. 

Proof, (sketch) Consider two randomly chosen individuals X, y. Then 
the time it takes for them to coalesce is Geometric with success prob- 
ability p = 1/N: indeed, at each new generation, the probability that 
the two genes go back to the same ancestor is 1/N since every gene 
chooses its parent uniformly at random and independently of one 
another. Let T/v be a geometric random variable with parameter 
1/N. Since 

an exponential random variable with parameter 1, we see that the 
pair (x, y) coalesces at rate approximately 1 once time is speed up by 
A^. This is true for every pair, hence we get Kingman's n-coalescent. 

□ 

We briefly comment that this is the general structure of limiting 
theorems on the genealogy of populations: n is fixed but arbitrary, 
is going to infinity, and after speeding up time by a suitable factor, we 
get convergence towards the restriction of a nice coalescing process 
on n particles. 

Despite their simplicity, the Wright-Fisher or the Moran model 
have proved extremely useful to understand some theoretical prop- 
erties of Kingman's coalescent, such as the duality relation which will 
be discussed in the subsequent sections of this chapter. However, be- 
fore that, we will discuss an important result, due to Mohle, which 
gives convergence towards Kingman's coalescent in the above sense, 
for a wide class of population models known as Cannings models and 
may thus be viewed result of universality. 

2.2.3 Mohle's lemma 

We now describe the general class of population models which is 
the framework of Mohle's lemma, and which are known as Cannings 
models (after the work of Cannings [50, 51]). As the reader has surely 
guessed, we will first impose that the population size stays constant 
equal to A^ > 1, and we label the individuals of this population 
1,...,A^. To define this model, consider a sequence of exchange- 
able integer- valued random variables (ui, . . . jV^), which have the 
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property that 



(29) 



i=l 



The have the following interpretation: at every generation, all 
individuals reproduce and leave a certain number of offsprings in the 
next generation. We call the number of offsprings of individual 
i. Note that once a distribution is specified for the law of (i^i)£i, a 
population model may be defined on a bi-infinite set of times t G Z 
by using i.i.d. copies {{i'i(t))fLi,t € Z}. The requirement (29) corre- 
sponds to the fact that the total population size stays constant, and 
the requirement that for every t S Z, {vi{t))^^^ forms an exchange- 
able vector corresponds to the fact that there are no spatial effects 
or selection: every individual is treated equally. 

Having defined this population dynamics, we consider again the 
coalescing process obtained by sampling n < N individuals from the 
population at time 0, and considering their ancestral lineages: that 
is, let (n"''^, t = 0, 1, . . .) be the P„-valued process defined by putting 
i ~ J if and only if individuals i and j share the same ancestor 
at generation —t. This is the ancestral partition process already 
considered in the Moran model and the Wright-Fisher diffusion. 

Before stating the result for the genealogy of this process, which 
is due to Mohle [122], we make one further definition: let 



Note that cat is the probability that two individuals sampled ran- 
domly (without replacement) from generation have the same par- 
ent at generation —1. Indeed, this probability p may be computed 
by summing over the possible parent of one of those lineages and is 
thus equal to 



since E(i/j) = E(i/i) by exchangeability. Thus cn is the probability 
of coalescence of any two lineages in a given generation. Note that if 
we wish to show convergence to a continuous coalescent process, cat 
(or rather I/ctv) gives us the correct time-scale, as any two lineages 
will coalescence in a time of order 1 after speeding up by I/cat. We 
may now state the main result of this section: 




(30) 
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Theorem 2.5. (Mohle's Lemma.) Consider a Cannings model de- 
fined by i.i.d. copies {{vi{t))f^]^,t G Z}. // 



then Cat ^ and the genealogy converges to Kingman's coalescent. 

The formal statement which is contained in the informal wording 
of the theorem is that (11"^'^, t > 0), converges to Kingman's n- 
coalescent for every n > 1. 

Although the proof is not particularly difficult, we do not include 
it in these notes, and refer the interested reader to [122]. However, 
we do note that the left hand side of (31) is, up to a scaling, equal 
to the probability that three lineages merge in a given generation. 
Thus the purpose of (31) is to demand that the rate at which three or 
more lineages coalesce is negligible compared to the rate of pairwise 
mergers: this property is indeed necessary if we are to expect King- 
man's coalescent in the limit. See Mohle [121] for other criterions 
similar to (31). 

2.2.4 Diffusion approximation and duality 

Consider the Moran model discussed in Theorem 2.3, and assume 
that at some time t, say t = without loss of generality, the popula- 
tion consists of exactly two types of individuals: those which carry 
allele a, say, and those which carry allele A. For instance, one may 
think that allele a is a mutation which affects a fraction < p < 1 
of individuals. How does this proportion evolve with time? What is 
the chance it will eventually invade the whole population? 

Prom the description of the Moran model itself, it is easy to see 
that, in the next dt units of time (with dt infinitesimally small), if 
Xt is the fraction of individuals with allele a, we have, if Xt = x: 



n^ijvi - l){ui-2)) 

N^CN 







(31) 




X + with probability Nx{l — x)dt + o{dt) 
X — jj with probability A^a;(l — x)dt -\- o{dt) 
X with probability 1 — 2A^x(l — x)dt + o{dt). 



Indeed, Xt may only change by +\/N if an individual from the A 
population dies (which happens at rate N{l—x)) and is replaced with 
an individual from the a population (which happens with probability 
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x). Hence the total rate at which Xt increases by is A^x(l — x). 
Similarly, the total rate of decrease by is x{l — x), since for 
this to occur, an individual from the a population must die and be 
replaced by an individual from the A population. 
Thus we see that the expect drift is 

E{dXt\a{Xs,0 < s<t)) = 

and that 

2 

var(dXt|o-(X,,0 < s < t)) = - Xt)dt + o{dt). 

By routine arguments of martingale methods (such as in [75]), it 
is easy to conclude that, speeding time by N/2, Xt converges to a 
nondegenerate diffusion: 

Theorem 2.6. Let {X^ ,t > 0) he the fraction of individuals carry- 
ing the a allele at time t in the Moran model, started from Xq = 
p G (0,1). Then 

{X^,/^,t>0) -^d {Xt,t>^) 

in the Skorokhod topology o/B(R+,M), where X solves the stochastic 
differential equation: 

dXt = y^Xtil - Xt)dWt; Xo=p (32) 

and W is a standard Brownian motion. (32) is called the Wright- 
Fisher diffusion. 

Note that the Wright-Fisher diffusion (32) has infinitesimal gen- 
erator 

L/(x) = ix(l-x)^. (33) 

Remark 2.1. In some texts different scalings are sometimes con- 
sidered, usually due to the fact that the "real" population size for 
humans (or any diploid population) is 2N when the number of indi- 
viduals in the population is N . These texts sometime don't slow down 
accordingly the scaling of time, in which case the limiting diffusion 
is: 

dXt = ^^Xt{l-Xt)dWt; Xo=p 

which is then called the Wright-Fisher diffusion. This unimportant 
change of constant explains discrepancies with other texts. 
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As we will see, this diffusion approximation has many consequences 
for questions of practical importance, as several quantities of inter- 
est have exact formulae in this approximation (while in the discrete 
model, these quantities would often be hard or impossible to compute 
exactly). There are also some theoretical implications, of which the 
following is perhaps the most important. This is a relation of duality, 
in the sense used in the particle systems literature ([115]), between 
Kingman's coalescent and the Wright-Fisher diffusion. Intuitively, 
the Wright-Fisher diffusion describes the evolution of a subpopula- 
tion forward in time, while Kingman's coalescent describes ancestral 
lineages backward in time, so this relation is akin to a change of 
direction of time. The precise result is as follows: 

Theorem 2.7. Let and denote respectively the laws of a 
Wright-Fisher diffusion and of Kingman's coalescent. Then, for all 
< p < 1, and for all n > 1, we have: 

E^((X,)")=Er(pl"*l) (34) 

where \Ilt\ denotes the number of blocks of the random partition lit- 

Proof, (sketch) Consider a Moran model with total population size 
> 1, and consider a subpopulation of allele a individuals obtained 
by flipping a coin for every individual with success probability p. 
Choose n individuals at random out of the total population at time 
Nt/2. What is the chance of the event E that these n individuals 
carry the a allele? On the one hand, this can be computed by going 
backward in time Nt/2 units of time: by Theorem 2.3, there are 
then approximately 111^1 ancestral lineages, where 11 is Kingman's 
n-coalescent, and each of them carries the a allele with probability 
p. If each of them carries the a allele, then their descendant also 
carries the allele a, so 

P(i?)«E7(p|n'l). 

On the other hand, by Theorem 2.6, at time tN/2 we know that the 
proportion of a individuals in the population is approximately Xf. 
Thus the probability of the event E is, as N ^ oo, approximately 

nE)^E^{X^). 

Equating the two approximations yields the result. □ 
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The relationship (34) is cahed a duality relation. In general, two 
processes X and Y with respective laws and E~* are said to be 
dual if there exists a function ip{x, y) such that for all x, y: 

E^{ij{Xt,y))=E'^{i;{x,Yt)). (35) 

In our case Xt is the Wright-Fisher diffusion and A'^t = is the 
number of blocks of Kingman's coalescent, and ■ilj{x,n) = x". In 
particular, as n varies, (34) fully characterizes the law of Xt, as it 
characterizes all its moments. 

As an aside, this is a general feature of duality relations: as y 
varies, the Ej^('0(Xj , y)) characterizes the law of Xt started from x. 
In particular, relations such as (35) are extremely useful to prove 
uniqueness results for martingale problems. This method, called 
the duality method, has been extremely successful in the literature 
of interacting particle systems and superprocesses, where it is of- 
ten relatively simple to guess what martingale problem a certain 
measure-valued diffusion should satisfy, but much more complex to 
prove that there is a unique in law solution to this problem. Having 
uniqueness usually proves convergence in distribution of a certain 
discrete model towards the continuum limit specified by the martin- 
gale problem, so it is easy to see why duality can be so useful. For 
more about this, we refer the reader to the relevant discussion in 
Etheridge [72]. 

We stress that duals are not necessarily unique: for instance, King- 
man's coalescent is also dual to a process known as the Fleming- Viot 
diffusion, which will be discussed in a later section as it will have im- 
portant consequences for us. 

We now illustrate Theorems 2.6 and 2.7 with some of the promised 
applications to questions of practical interest. Consider the Moran 
model of Theorem 2.3. The most obvious question pertains to the 
following: if the a population is thought of as a mutant from the A 
population, what is the chance it will survive forever? It is easy to see 
this can only occur if the a population invades the whole population 
and all the residents (i.e., the A individuals) die out. If so, how long 
does it take? 

Let X^ denote the number of a individuals in a Wright-Fisher 
model with total population size and initial a population Xq^ = 
pN, where < p < 1. Note that X^ is a finite Markov chain with 
only two traps, and N. Thus X^ := limt_»oo Xt^ exists almost 
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surely and is equal to or N. Let D be the event that X'^ = (this 
is event that the a alleles die out), and let S be the complementary 
event (this is the event that a wipes out A). In biological terms, the 
time at which this occurs, say T/v, is known as the fixation time. 

Theorem 2.8. We have: 

F{S)=p; F{D) = l-p. (36) 

Moreover, the fixation time Tjy satisfies: 

E(r^) - -N{plogp + (1 - p) log(l - p)). (37) 

Proof. The first part of the result follow directly from the observation 
that is a bounded martingale and the optional stopping theorem 
at time T. For the second part, we use the diffusion approximation 
of Theorem 2.6 and content ourselves with verifying that for the 
limiting Wright-Fisher diffusion, the expected time T to absorption 
at or 1 is 

E(r) = -2(plogp + (1 - p) log(l - p)). (38) 

Technically speaking, there is some further work to do such a check- 
ing that 2T]\f/N — > T in distribution and in expectation, but we are 
not interested in this technical point here. Note that for a diffusion, 
if f{p) is the expected value of T starting from p, then f{p) satisfies: 

\/(0) = /(I) = 0, 

where Lf{x) = ^x{l — x)f"{x) is the generator of the Wright-Fisher 
diffusion. To see where (39) comes from, observe that, for any e > 0, 
and for all < x < 1, 

fix) = E,(r) = E,(E,(r|.7^,)), 

where J-'s = a{Xs,s < e). Thus by the Markov property, letting 
Pf{x) be the semigroup of the diffusion X: 

f{x)=E^{e + ExAT)) 
= e + Pef{x) 

= e + fix) + eLfix) + o{e) 
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where the last equahty holds since Lf{x) = Hrrig^o ^^^^^ ■ Since 
this last equality must be satisfied for all e > 0, we conclude, after 
canceling the terms f{x) on both sides of this equation and dividing 
by e: 

1 + Lf{x) = 

for all X £ (0, 1), which, together with the obvious boundary condi- 
tions /(O) = /(I) = 1, is precisely (39). 

Now, (39) can be solved explicitly and the solution is indeed (38), 
hence the result. □ 

For p = 1/2, we get from Theorem 2.8 that 

E{Tn) ~ 1.38iV 

or, for diploid populations with N individuals, E(T/v) ~ 2.56A^. As 
Ewens [80, Section 3.2] notes, this long mean time is related to the 
fact that the spectral gap of the chain is small. 

In practice, it is often more interesting to compute the expected 
fixation time (and other quantities) given that the a allele succeeded 
in invading the population. In that case it is possible to show: 

E(rjv|5)^- ^^^~^^ log(l-p). 
P 

See, e.g., [66, Theorem 1.32]. 

2.3 Ewens' sampling formula 

We now come to one of the true cornerstones of mathematical pop- 
ulation genetics, which is Ewens' sampling formula for the allelic 
partition of Kingman's coalescent (these terms will be defined in 
a moment). Basically, this is an exact formula which governs the 
patterns of genetic variation within a population satisfying all three 
basic assumptions leading to Kingman's coalescent. As a result, this 
formula is widely used in population genetics and in practical studies; 
its importance and impact are hard to overstate. 

2.3.1 Infinite alleles model 

We now define one of the basic objects of this study, which is the al- 
lelic partition. It is based on a model called the infinite alleles model 
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which we now describe. Imagine that, together with the evolution 
of a population forwards in time (such as considered in the Moran 
model or in the Wright-Fisher model, say), there exists a process by 
which mutations occur and which induces differences between the al- 
lele observed in a child from that of his parent. If we consider a large 
gene, (i.e., one which consists of a fairly large DNA sequence), it is 
reasonable to assume that the mutation will make a change never 
seen before and never to be reproduced again by a future mutation, 
that is, every mutation generates a new, unique allele. To simplify 
extremely, imagine we are looking at the genealogy of a gene coding 
for, say, eye colour. We may imagine that, initially, all individuals 
carry the same allele, i.e., have the same eye colour (say brown). 
Then as time goes by, a mutation occurs, and the child of a cer- 
tain individual carries a new colour, maybe blue. His descendants 
will also all have blue eyes, and descendants of other individuals will 
carry brown eyes, until one of them gets a new mutation, giving him 
say green eyes, which he will in turn transmit to his children, and 
so on and so forth. The allelic partition is the one that results when 
we identify individuals carrying the same eye colour (or, more gener- 
ally, the same allele at the observed gene) . We describe this partition 
through a vector, called the allele frequency spectrum, which simply 
counts the number of different alleles with a given multiplicity: that 
is, ai is the number of distinct alleles which are shared by exactly i 
individuals. See Figure 6 for an illustration of the allelic partition. 

For instance, (the two following datasets are taken from [66], and 
were gathered respectively by [57] and [150]), in a study of n = 60 
drosophilae {D. persimilis), the allelic partition was represented by, 

ai = 18, 02 = 3, a4 = 1, 032 = 1. 

That is, 1 allele was shared by 32 individuals, 1 allele was shared 
by 4 individuals, 3 alleles were found in pairs of individuals, and 18 
individuals had a unique allele. Thus the associated partition had 
18-|-3-|-l-|-l = 23 blocks. In another, larger study of Drosophila 
{D. pseudobscura), on n = 718 individuals: 

Oi = 7, 02 = 03 = 05 = 06 = as = 09 = 026 = 036 = ^37 = 1, 
082 = 2, O149 = 1, 0266 = 1- 
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Figure 6: The allelic partition generated by mutations (squares). 



2.3.2 Ewens sampling formula 

Given the apparent difference between these two data sets, what can 
we expect from a typical sample? It is natural to assume that muta- 
tions arrive at constant rate in time, and, that, for many populations, 
assumptions (1), (2) and (3) are satisfied so that the genealogy of 
a sample is defined by Kingman's coalescent. Thus, we will assume 
that, given the coalescence tree of a sample (obtained from King- 
man's coalescent), mutations fall on the coalescence tree according 
to a Poisson point process with constant intensity per unit length, 
which we define to be 9/2 for some 9 > 0, for reasons that will be 
clear in a moment. We may then look at the (random) allelic par- 
tition that this model generates, and ask what does this partition 
typically look like. See Figure 6 for an illustration. 

Note that, H"" defines a consistent family of partitions as n in- 
creases, so one may define a random partition H, called the allelic 
partition, such that IT" = n|[„] for all n > 1. 

Theorem 2.9. Let H be the allelic partition obtained from King- 
man's coalescent and the infinite alleles model with mutation rate 
9/2. Then U has the law of a Poisson-Dirichlet random partition 
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with parameter 9. In particular, the probability that Ai = ai, A2 = 
a2,...,An = an, is given by: 



Proof. We first note that (40) is simply a reformulation of Ewens' 
sampling formula in Theorem 1.6, taking into account the combina- 
torial factors needed when describing 11 indirectly through the sum- 
mary statistics (ai, . . . ,a„), which is the traditional data recorded. 
Note in particular that (40) shows that the vector (^1, . . . ,An) has 
the distribution of independent Poisson random variables Zi, . . . , Z.„ 
with parameters 9/j, conditioned on the event that ^^=ijZj = n. 

A particularly simple and elegant proof that 11 has the Poisson- 
Dirichlet (0, 6) consists in showing directly that 11 can be constructed 
as a Chinese Restaurant Process with parameter 9. Suppose the 
coalescence tree is drawn with the root at the bottom and the n 
leaves at the top (like a real-life tree!) To every leaf of the tree, 
there is a unique path between it and the root, and there is a unique 
first mutation mark on this path, which we run from the leaf to the 
root, i.e. top to bottom (if no such mark exists, we may consider 
extending the coalescence tree by adding an infinite branch from 
the root, on which such a mark will always exist almost surely). 
Note that to describe the allelic partition n", it suffices to know to 
know the portion of the coalescence tree between all the leaves and 
their first marks, and do not care about later coalescence events or 
mutations (here, time is also running in the coalescence direction, 
i.e., from top to bottom). This consideration leads us to associate 
to the marked coalescence tree a certain forest, i.e., a collection of 
trees, which are subtrees from the coalescence tree and contain all 
the leaves and their nearest marks. 

Define < r„ < T„_i < ...T2 < Ti to be the times at which 
there is an event which reduces the number of branches in the forest 
described above. These events may be of two types: either a coales- 
cence or a mutation (which "kills" the corresponding branch). In ei- 
ther case, the number of branches decreases by 1, so there are exactly 
n such times until there are no branches left. The Chinese Restau- 
rant Process structure of the partition is revealed when we try to 
describe this forest from bottom to top, by looking at what happens 




n 



p{ai,...,an 



) 



(40) 
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at the reverse times Ti, . . . , T„. At each time Tm, 1 < m < n, 
we may consider the partition 11^, which is the partition defined by 
the m hneages uncovered by time T^- At time Ti, we are adding a 
hneage which may be considered the root of this forest. Naturahy, 
the partition it defines is just {!}. Suppose now that m > 1, and 
we are looking at the distribution of the partition n^_,_]^, given IIJ^. 
This (m + 1)*^ hneage disappeared from the forest for either of two 
reasons: either by coalescence, or by mutation. If it was a mutation, 
then we know this lineage will open a new block of the partition 
nj^^;^. If, however, it was by a coalescence, then our (m + 1)*^ cus- 
tomer joins one of the existing blocks. It remains to compute the 
probabilities of these events. Note that between times and T^+i 
there are precisely m + 1 other lineages. Suppose the cluster sizes of 
are respectively ni, . . . ,71^. The total rate of coalescence when 
there are m + 1 lineages is m{m + l)/2, and the total mutation rate 
is (m + l)6'/2. It follows that, 

P(new blockin;^) = — ^1"" ^ ^}!^ rr 

^ ' m{m + l)/2 + e{m+l)/2 

6 

m + 6 

Indeed, the event that there is a coalescence or a mutation at time 
rather than a mutation, is independent of IIJ^ by the strong Markov 
property of Kingman's coalescent at time Tm+i- Similarly, 

P(join block of size mm = ^T!!^ t i W9 ~ ^^l) 

m[m + l)/2 + &[m + 1)/ Im 

_ rij 
m + 

Indeed, in order to join a table of size nj, first a coalescence must have 
occurred (this is the first term in the right-hand side of (41)), then 
we note that conditionally on this event, the new lineage coalesced 
with a randomly chosen lineage, and thus a particular group of size Ui 
was chosen with probability rij/m. We recognize here the transitions 
of the Chinese Restaurant Process, so by Theorem 1.5 we get the 
desired result. □ 

Remarkably, when W. Ewens [79] found this formula (40), this 
was without the help of Kingman's coalescent, although clearly this 
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viewpoint is a big help. There exist numerous proofs of Ewens sam- 
pling formula for the infinite alleles model, of which several can be 
found in [95], together with some interesting extensions. 



2.3.3 Some applications: the mutation rate 

We briefly survey some applications of the above result. One chief 
application of Theorem 2.9 is the estimation of the mutation rate Q jl. 
Note that, by Ewens' sampling formula, the conditional distribution 
of vr given the number of blocks is 

k 

P(n„ = T^\Kn = k)= Cn,k n '^^^ (42) 

1=1 

where Cn,k = X^itlliLi ' where the sum is over all partitions with 
k blocks. Equivalently, since Yl^=i = ^) 

p{au...,an\k)=c'^^,fl^-^ (43) 
j=l J 

for a different normalizing constant ^ . The striking feature of 
(42) and (43) is that both right-hand sides do not depend on 9. In 
particular, we can not learn anything about 6 beyond what we can 
tell from simply looking at the number of blocks. In statistical terms, 
Kn is a sufficient statistics for 9. This raises the question: how to 
estimate 6 from the number of blocks? 

Theorem 2.10. let 11 6e a PD{6) random partition, and let n„ he 
its restriction to [n\, with Kn Mocks. Then 



logn 

as n —> CO. Moreover, 

Kn-6 log n 
V 6 log n 



a.s. (44) 



:AA(0,1). (45) 



Proof. (44) is an easy consequence of the Chinese Restaurant Process 
construction of a PD{6) random partition. Indeed, let Jj be the 
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indicator random variable that customer i opens a new block. Then 
Kn = li and the random variables li are independent, with 

P(/* = 1) 



Thus K{Kn) ~ 6'logn,, and var(K.„) < E(ir„). Find a subsequence 
Uk such that fe^ < E(i^„J < {k + if. (Thus if 6* = 1, = e^*^ 
works.) By Chebyshev's inequality: 



_ var(i^„J 20 



e2(lognfc)2 £2/^2 



for every e > 0. By the Borel-Cantelli lemma, there is almost sure 
convergence along the subsequence n/j. Now, using monotonicity of 
Kn and the sandwich theorem, we see that for n such that < n < 
Hk+i, we get 



log(nfc+i) logn log(nfc) 

but since 1 < '°^g"^+)^ < ^ 1, we see that both left- and 

right-hand sides of the inequalities converge to 9 almost surely. This 
proves (44). The central limit theorem (45) follows from a similar 
application of the Lindeb erg- Feller theorem for triangular arrays of 
independent random variables (see Theorem 4.5 in [65]). □ 

As Durrett [66] observes, while Theorem 2.10 is satisfactory from 
a theoretical point of view, as it provides us with a way to estimate 
the mutation rate 9, in practice the convergence rate of l/yTogn is 
very slow: it takes n = e^^^ to get a standard deviation of about 0.1. 

In fact, it can be shown using Ewens' sampling formula that the 
maximum-likelihood estimator of 9 is the value 9 which makes E(i('„) 
equal to the observed number of blocks. In that case, the variance 
of that estimator is also roughly the same, as it can be shown using 
general theory of maximum likelihood (see below Theorem 1.13 in 
[66]) 

var(^) = ( -2 vaT{Kn, 



^ 9"^ J log n 

which is the same order of magnitude as before, i.e., very slow. Other 
ideas to estimate 9 can be used, such as the sample homozigosity: 
this is the proportion of pairs of individuals who share the same allele 
in our sample. See [66, Section 1.3] for an analysis of that functional. 
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2.3.4 The site frequency spectrum 

Along with the infinite aheles model, there is a different model for 
genetic variation, called the infinite sites model. For arbitrary rea- 
sons, we have decided to spend less time on this model than on the 
infinite alleles model, and the reader wanting to move on to the next 
subject is invited to do so. However, this model will come back as 
a very useful theoretical tool in later models of A-coalescents, as it 
turns out to be closely related to the infinite alleles model but is 
partly easier to analyse. 

The infinite sites model looks at a type of data which is altogether 
different from the one we were trying to model with the infinite alleles 
model. Suppose we look at a fixed chromosome (rather than a fixed 
locus on that chromosome). We are interested in seeing which sites 
of the chromosome are subject to variation. Suppose, for instance 
that the chromosome is made of 10 nucleotides, i.e., is a word of 10 
letters in the alphabet A,T,C,G. In a sample of n individuals, we can 
observe simultaneously these 10 nucleotides. It then makes sense 
to ask which of those show variation: for instance, we may observe 
that nucleotide 5 is the same in all individuals, while number 3 has 
different variants present in a number of individuals, say k. These k 
individuals don't necessarily have the same nucleotide 3 or at other 
places, and so it would seem that by observing this data we gain 
more insight about the genealogical relationships of the individuals 
in our sample (as we can observe several loci at once!) but it turns 
out that this is not the case. 

The infinite sites model starts with the assumption that each new 
mutation affects a new, never touched before or after, site (locus) 
of the chromosome. This mutation is transmitted unchanged to all 
the descendants of the corresponding individual, and will be visi- 
ble forever. Hence mutations just accumulate, instead of erasing 
each other. Our assumptions will still be that the mutation rate 
is constant, and that, given the coalescence tree on n individuals 
(a realisation of Kingman's n-coalescent), mutations fall on it as a 
Poisson point process with constant intensity 6/2 per unit length. 
In this model, there is no natural partition to define, but it makes 
sense to ask: 

1. what is the total number of sites Sn at which there is some 
variation? 
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2. What is the number Mj{n) of sites at which exactly j individ- 
uals have a mutation? 

Sn is called the number of segregating sites and is often referred to 
in the biological literature as SNPs for Single Nucleotide Polymor- 
phism. The statistics (Mi(n), M2(n), . . . , M„(?i)) is called the site 
frequency spectrum of the sample. Note that X]j=i-^i( 
Note also that Sn may be constructed simultaneously for all n > 1 
by enlarging the sample and using the consistency of Kingman's co- 
alescent. 



Theorem 2.11. We have: 



a.s. (46) 



log n 

as n —> oo, and furthermore for all j > 1: 

E(M,(n)) = -. (47) 
j 

Proof. Naturally, the reader is invited to make a parallel with The- 
orem 2.10. The result (46) is conceptually slightly simpler, as given 
the coalescence tree, Sn is simply a Poisson random variable with 
mean 6Ln/2, where L„ is the total length of the tree, i.e., the sum 
of the lengths of all the branches in the tree. But observe that while 
there are k lineages, the time it takes to get a coalescence is exponen- 
tial with mean 2/{k{k — 1)). Thus since there are exactly k branches 
during this interval of times, we get: 

n 2 n— 1 ^ 

h — 2 j — 1 

where hn is the harmonic series. Thus 

E(S„) = nnSn\Ln)) = \nLn) = OK-l 

Ologn. 

Easy large deviations for sum of exponential random variables and 
Poisson random variables, together with the Borel-Cantelli lemma, 
show the strong law of large numbers (46). 



Coalescent theory 



62 



For the site frequency spectrum, we use the Moran model em- 
bedding of Kingman's coalescent (Theorem 2.3). Let N denote the 
population size, and let n = be the sample size: that is, the sample 
is the whole population. Recall that the genealogy of these n indi- 
viduals, sped up by a factor (A^ — l)/2, is a realisation of Kingman's 
n-coalescent. Note that if a mutation appears at time —t, the prob- 
ability that it affects exactly k individuals is precisely pt{l,k) where 
Pt{x,y) denotes the transition probabilities of the discrete Markov 
chain which counts the number of individuals carrying that muta- 
tion. Since mutations only accumulate, we get by integration over 
t : 

poo 

E{Mj{n))= pt{l,j)edt = 0G{l,k) (48) 
Jo 

where G{x, y) is the Green function of the Markov chain, which 
computes the total expected number of visits to y started from x. 
(Since this chain gets absorbed in finite time (and finite expectation) 
to the state or 0, the Green function is finite. It is moreover easy 
to compute this Green function explicitly. Let G denote the Green 
function for the discrete time chain, X. Note that 

G{x,y) = ^Gix,y) (49) 

q{y) 

where q{y) is the total rate at which the chain leaves state y. In our 
case, that is 2y{N — y)/N. Now, note that G{k, k) is 1/p where p is 
the probability of never coming back to k starting from k. Indeed, the 
number of visits to k started from k is geometric with this parameter. 
When the chain leaves k, it is equally likely to go up or down. Using 
the fact that X is a martingale, and the optional stopping theorem 
such as in Theorem 2.8, we get: 

1111 2N 



2 k 2N-k k{N-k)' 
Thus 

6(,,,) = M^. (50) 

Now, note that by the Markov property, 

6(1, k) = ¥{n < To)G{k, k) = ^G{k, k) 
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and thus we get: 



N 

Remembering (49): 

G{l,k) = ^. 
Using (48), this completes the result. 
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3 A-coalescents 

In this chapter we introduce the A-coalescent processes, also known 
as coalescents with multiple collisions. We show a useful and intu- 
itive construction of these processes in terms of a certain Poisson 
point process, and analyse the phenomenon of coming down from 
infinity for these processes. We explain the relevance of these pro- 
cesses to the genealogy of populations through two models, one due 
to Schweinsberg, and another one due to Durrett and Schweinsberg: 
as we will see, these processes describe the genealogy of a popu- 
lation either when there is a very high variability in the offspring 
distribution, or if we take in to account a form of selection and re- 
combination (these terms will be defined below). Finally, we give a 
brief introduction to the work of Bertoin and Le Gall about these 
processes. 

3.1 Definition and construction 
3.1.1 Motivation 

We saw in the previous chapter how Kingman's coalescent is a suit- 
able approximation for the genealogy of a sample from a population 
which satisfies a certain number of conditions such as constant pop- 
ulation size, mean-field interactions and neutral selection, as well as 
low offspring variability. When these assumptions are not satisfied, 
i.e., for instance, when size fluctuations cannot be neglected, or when 
selection cannot be ignored, we need some different kind of coales- 
cence process to model the genealogy. Assume for instance that the 
population size has important fluctuations. E.g., assume that from 
time to time there are "bottlenecks" in which the population size is 
very small, due to periodical environmental conditions for instance. 
In our genealogy, this will correspond to times at which a large pro- 
portion of the lineages will coalesce. Similarly, if there is a large 
impact of selection, individuals who get a beneficial mutation will 
quickly recolonize an important fraction of the population, hence we 
will observe multiple collision when tracing the ancestral lineages at 
this time. Large variability in offspring distribution such as many 
coastal marine species also leads to the same property that many 
lineages may coalesce at once. In those situations, Kingman's coa- 
lescent is clearly not a suitable approximation and one must look for 
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coalescent processes which ahow for multiple mergers. 
3.1.2 Definition 

Of course, from the point of view of sampling, it still makes sense to 
require that our coalescing process be Markovian and exchangeable, 
and also that it defines a consistent process: that is, there exists an 
array of numbers {Xb,k)2<k<b which gives us the rate at which any 
fixed fc-tuple of blocks merges when there are b blocks in total. To ask 
that it is consistent is to ask that these numbers do not depend on 
n (the sample size), and that the numbers X^^k satisfy the recursion: 

Af),fc = A;,+i_fc + Xh+i^k+i- (52) 

Indeed, a given group of k blocks among b may coalesce in two ways 
when reveal an extra block 6+1: either these k coalesce by themselves 
without the extra block, or they coalesce together with it. (For 
reasons that will be clear later, at the moment we do not allow for 
more than 1 merger at a time: that is, several blocks are allowed to 
merge into 1 at a given time, but there cannot be more than 1 such 
merger at any given time) . We will refer to coalescent processes with 
no simultaneous mergers as simple. 

Definition 3.1. A family of n-coalescents is any family of simple, 
Markovian, Vn-valued coalescing processes (n",t > 0), such that li^ 
is exchangeable for any t > and consistent in the sense that the 
law o/n" restricted to [m] is that o/n™, for every 1 < m < n. It is 
uniquely specified by an array of numbers satisfying the consistency 
condition (52), where for 2 < k < b: 

\,k = merger rate of any given k-tuple of blocks among b blocks. 

Naturally, given any consistent family of n-coalescents, there exists 
a unique in law Markovian process 11 with values in V such that the 
restriction of IT to Vn has the law of 11. 

Definition 3.2. The process {Ht,t ^ 0), is a h.- coalescent, or coa- 
lescent with multiple collisions. 
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3.1.3 Pitman's structure theorems 

Multiple collisions, of course, refer to the fact there are times at 
which more than 2 blocks may merge, but also implicitly to the fact 
that such mergers do not occur simultaneously. (Processes without 
this last restriction have also been studied, most notably in [143]. 
They have enjoyed renewed interest in recent years, see, e.g., Taylor 
and Veber [149] or Birkner et al. [35]). 

The name of A-coalescent, however, may seem mysterious to the 
reader at this point. It comes from the following beautiful character- 
isation of coalescents with multiple collisions, which is due to Pitman 
[131]. 

Theorem 3.1. Let II be a coalescent with multiple collisions associ- 
ated with the array of numbers {Xb,k)2<k<b- Then there exists a finite 
measure A on the interval [0, 1], such that: 

h,k = [ x^-'^{l - xf~^K{dx) {2<k< b). (53) 

JO 

The measure A uniquely characterizes the law of 11, which is then 
called a K- coalescent. 

Proof, (sketch) Pitman's proof is based on De Finetti's theorem for 
exchangeable sequences of O's and I's: it turns out that (52) is pre- 
cisely the necessary and sufficient condition to have: 

^i,,J=nx\l-xy), i,j>o. 

for some random variable < X < 1, where = Ai+j+2j+2- (See 
(23)-(25) in [131]). ' ' □ 

This proof is clean but not very intuitive. We will launch below 
into a long digression about this result, which we hope has the merit 
of explaining why this result is true, even though, unfortunately this 
heuristics does not seem to yield a rigorous proof. However, along 
the way we will also uncover a useful probabilistic structure beneath 
(53), which will then be used to produce an elegant construction of 
A-coalescents. 

The bottom line of this explanation is that Theorem 3.1 should be 
regarded as a Levy-Ito decomposition for the process 11. The main 
reason for this as follows. Because we treat blocks as exchangeable 
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particles (in particular, we do not differentiate between an infinite 
block and a block of size 1), it is easy to convince oneself that a coa- 
lescent with multiple collisions (Ilt,t > 0) (in the sense of definition 
3.2), is a Levy process, in the sense that for every t, s > we may 
write, given J^t = cri^s, s <t): 

nf+, = ni*n', (54) 

where H'g is independent from J^t and has the same distribution as 
lis. Here, the ★ operation is defined as follows: for a partition vr = 
(i?i, . . .) and a partition vr' = {B[, . . .), the partition p = tt -k tt' is 
defined by saying that we coagulate all the blocks of vr whose labels 
are in the same block of vr': for instance if i and j are in the same 
block of vr', then Bi and Bj will be subsets of a single block of p. The 
operation -k is noncommutative and does not turn V into a group. 
However, it does turn it into what is known in abstract algebra as a 
monoid (i.e., the operation is associative and has a neutral element 
which is the trivial partition into singletons). 

The identity (54) says that (nt,t > 0) may be considered a Levy 
process in the monoid ("P, *)• At this point, it is useful to remind the 
reader what is a Levy processes: a real- valued process (Xj, f > 0) is 
called Levy if it has independent and stationary increments: for every 
t > 0, the process {Xt+s — Xt, s > 0) is independent from J^t and has 
same distribution as the original process X. The simplest example of 
Levy processes are of course Brownian motion and the simple Poisson 
process (an excellent introduction to Levy processes can be found in 
[27]). The most fundamental result about Levy processes is the Levy- 
Ito decomposition, which says that any real-valued Levy process can 
be decomposed as a sum of a Brownian motion, a deterministic drift, 
and compensated Poisson jumps. The simplest way to express this 
decomposition is to say that the characteristic function of Xt may 
written as: 

E(e'«^') =exp{tip{q)) 

where 

ip{u) = ciiq - C2— + e'^"" - 1 - gxl{|a.|<i}Z^(dx). (55) 
Here, ci G M, C2 > and v{dx) is any measure on M such that 



/ (|xp A l)v{dx) < oo. 



(56) 
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A measure which satisfies (56) is cahed a Levy measure, and the 
formula (55) is cahed the Levy-Khintchin formula. It says in partic- 
ular that the evolution of X is determined by a Brownian evolution 
together jumps and deterministic drift, where the rate at which the 
process makes jumps of size x is precisely v{dx). The integrability 
condition (56) is precisely what must be satisfied in order to make 
rigorous sense of that description through a system of compensation 
of these jumps by a suitable drift. 

The notion of Levy process can be extended to a group G, where 
here we require only the process X to satisfy that X{t)^'^X{t + s) 
is independent from Tt, and has the same distribution as X{s), for 
every s,t > 0. Without entering into any detail, when the group 
G is (locally compact) abelian, the Fourier analysis approach to the 
Levy-Khintchin formula (55) is easy to extend (via the characters 
of G, which are then themselves a locally compact abelian group) 
and yields a formula similar to (55). In noncommutative setups, this 
approach is more difficult but nonetheless there exist some important 
results such as a result of Hunt for Lie groups [100]. (I learnt of this 
in a short but very informative account [7]). 

I am not aware of any result in the case where the group G is 
replaced by a non-abelian monoid such as V, but it is easy to imagine 
that any Levy process in V (i.e., a process which satisfies (54) may 
be described by a measure ly on the space V, which specifies the 
infinitesimal rate at which we multiply the current partition 11^ by 
vr: 



Now, note that in our situation, we have some further information 
available: we need our process to be exchangeable, and to not have 
more than 1 merger at a time, i.e., to be what we called simple. Thus 
1/ must be supported on measures with only one nontrivial block, and 
must be exchangeable. By De Finetti's theorem, the only possible 
way to do that is to have a (possibly random) number < p < 1, 
and have every integer i take part into that block by tossing a coin 
with success probability p. Let Hp denote this random partition. 

Definition 3.3. The operation vr i— > vr * Kp is called a p-merger of 
the partition vr. 




(57) 



In words, for every block of the partition vr, we toss a coin whose 
probability of heads is p. We then merge all the blocks that come 
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up heads. That is, we coalesce a fraction p of all blocks through 
independent coin toss. Therefore, we see that (57) transforms into: 
given a coalescent with multiple collision (Ilt,t > 0), there exists a 
measure v on [0, 1], such that 

at rate ^{dp) : perform a p-merger. 

If this is indeed the case, then note that the numbers Xb^k satisfy: 

\,k= [\Hi-p)'-^iy{dp). 

Jo 

This is now looking very close to the statement of Theorem 3.1. It 
remains to see why v{dp) may be written as p~'^A{dp) where A is a 
finite measure. (This is, naturally, the equivalent of the integr ability 
condition (56) in this setup). However, this is easy to see: imagine 
that there are currently n blocks. If p is very small, in fact small 
enough that only 1 block takes part in the p-merger, then we may as 
well ignore this event since it has no effect on the process. In order 
to have a well-defined process, thus suffices that the rate at which 
at least two blocks merge is finite (when there is a finite number of 
blocks), and thus this condition reads: 



2 iP u{dp) < oo. (58) 

Naturally, this is the same as asking that ^{dp) can be written as 
p~^A(dp) for some finite measure A. Thus 



1 

p^-^{l-pf~^k{dp) (59) 







for some finite measure A on [0,1]. This is precisely the content of 
Theorem 3.1. □ 

I do not know whether this approach has ever been made rigor- 
ous (or has ever been attempted). The problem, of course, is that 
the Levy-Ito decomposition (57) is not known a priori for general 
monoids. This is a pity, as I think this approach is more satisfying. 

However, all this digression has not been in vain, as on the way we 
have discovered the probabilistic structure of a general A-coalescent. 
It also gives us a nice construction of such processes in terms of a 
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Poisson point process. This construction is often referred to as the 
Poissonian construction. (As explained above in length, this should 
really be regarded as the Levy-Ito decomposition of the Levy process 
(lit, t > 0). We summarise it below. The construction is easier when 
A has no mass at 0: A({0}) = 0. 

Theorem 3.2. Let A be a measure on [0, 1] such that A({0}) = 0. 
Let {pi,ti)i>i be the points of a Poisson point process on (0,1] x 
M+ with intensity p~'^A{dp) dt. The process (llt,t > 0) may be 
constructed by saying that, for each point {pi,ti) of the point process, 
we perform a pi-merger at time ti. 

Recall that a p-merger is simply defined by saying that we merge 
a proportion p of all blocks by independent coin-toss. This con- 
struction is well-defined because the restriction 11" of 11 to Vn is 
well-defined for every n > 1, thanks to the remark that the total 
rate at which pairs coalesce is finite (58). As usual, since the re- 
strictions IT" are consistent, this uniquely defines a process IT on V, 
which has the property that n|[„] = n". 

We stress that this structure theorem is often the key to proving 
results about A-coalescents, so we advise the reader to make sure 
this sinks through before proceeding further. At the risk of boring 
the reader, here it is again in simple words: a coalescent process 
with multiple collisions is entirely specified by a finite measure A on 
(0,1): p~'^A{dp) gives us the rate at which a fraction p of all blocks 
coalesces (at least when A({0}) = 0). 

The case where A has an atom at 0, say A({0}) = p for some 
p > 0, is not much different. It can be seen from (53) that this 
number comes into play only if A; = 2: that is, for binary mergers. 
It is easy to see what happens: decomposing 

A{dp) = p6o{dp) + A{dp) (60) 

where A has no mass at 0, the dynamics of (11^,4 > 0) can be de- 
scribed by saying that, in addition to the Poisson point process of 
p-mergers governed by p~^A(dp), every pair of blocks merges at rate 
p. More formally: 

Corollary 3.1. Let A be a measure on (0,1) and let p := A({0}). 

Let {pi,ti)i>i be the points of a Poisson point process on (0, 1] x M_|_ 
with intensity p~'^ A{dp) dt , where A is defined by (60). The process 
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(Ht, t > 0) may be constructed by saying that, for each point {pi,ti) of 
the point process, we perform a pi-merger at time ti, and, in addition, 
every pair of blocks merges at rate p. 

Thus, the presence of an atom at zero adds a "Kingman compo- 
nent" to the A-coalescent. We will see below that, indeed, when A 
is purely a Dirac mass at 0, the corresponding A-coalescent is King- 
man's coalescent (sped up by an appropriate factor corresponding to 
the mass of this atom). 

To conclude this section on Pitman's structure theorems, we give 
the following additional interpretation for the significance of the mea- 
sure A. 

Theorem 3.3. Let K be a finite measure on [0, 1] with no mass at 
zero. Let (n(,t > 0) 6e a K- coalescent. Let T be the first time that 
1 and 2 are in the same block. Then T is an exponential random 
variable with parameter A([0, 1]). Moreover, if F is the fraction of 
blocks that take part in the merger occurring at time T, then F is a 
random variable in (0,1), with law: 



In other words, the finite measure A, normalised to be a probability 
measure, gives us the law of the fraction of blocks that are coalescing, 
when two given integers first become part of the same block. 

Proof. The proof is obvious from Theorem 3.2. Until 1 and 2 coa- 
lesce, their respective blocks are always Bi and B2 because of our 
convention to label blocks by increasing order of their least elements. 
Given an atom of size p, the probability that 1 and 2 coalesce is 
precisely p^. Thus, define a thinning of the Poisson point process 
{t[,p[)., where each mark {ti,pi) is kept with probability pf. By clas- 
sical theory of Poisson point processes, the resulting point process is 
also Poisson, but with intensity measure A{dp) ^ dt. Thus the rate 
at which they coalesce is precisely A([0, 1]), and the first point of this 
process has a distribution which is proportional to A. □ 

3.1.4 Examples 

It is high time to give a few examples of A-coalescents. Naturally, 
the main example of a A-coalescent is that of Kingman's coalescent: 




K{dx) 



(61) 
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Example 1. Let A be the unit Dirac mass at 0: 



A{dx) = 6o{dx). 



In that case, (53) translates into Xb^k = except if A; = 2, in which 
case Xf)^2 = 1- Thus the corresponding A-coalescent is nothing but 
Kingman's coalescent (every pair of blocks is merging at rate 1). 

Example 2. Another measure which will play an important role 
towards the end of these notes will be the case where A{dx) = dx is 
the uniform measure on (0, 1). In this case the A-coalescent is known 
as the Bolthausen-Sznitman coalescent, and the transition rates Xb^k 
can be computed more explicitly as 



The Bolthausen-Sznitman coalescent first arose in connection with 
the physics of spin glass, an area about which we will say a few words 
at the end of these notes. But this is not the only area for which 
this process is relevant: for instance, we will see that it describes 
the statistics of a certain combinatorial model of random trees and 
is thought to be a universal scaling limit in a wide variety of mod- 
els which can be described by "random travelling waves": all those 
topics will be (briefly) discussed in that last chapter. 

Example 3. Let < a < 2. Assume that A{dx) is the Beta(2— a, a) 
distribution: 



The resulting coalescent is simply called the Beta- coalescent with 
parameter a. It is an especially important family of coalescent pro- 
cesses, for both theoretical and practical reasons: on the one hand 
we will see that they are related to the genealogy of populations 
with large variation in the offspring distribution, and on the other 
hand, they are intimately connected with the properties of an object 
known as the stable Continuum Random Tree. This correspondence 
and its consequences will be discussed in the next chapter. 

Example 4. A peculiar coalescent arises if A is simply taken to be a 
Dirac mass at p = 1. In that case, nothing happens for an exponen- 
tial amount of time with mean 1, at which point all blocks coalesce 




(62) 




(63) 
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into 1. The corresponding coalescent tree is then star-shaped: there 
is one root and an infinite amount of leaves connected to it. For this 
reason some authors call this process the star-shaped coalescent 

Let < a < 2, and consider the Beta-coalescent (Jlt,t > 0) de- 
fined by (63) in Example 3. Note that when a = 1, this is just 
the Bolthausen-Sznitman coalescent, while when a — > 2~, the Beta 
distribution is an approximation of the Dirac mass, and hence if /Xq, 
denotes the distribution (63), we have: 

fJ-a^^o, {a ^2). 

where =^ is the vague convergence (convergence in distribution). 
Thus one should think of a Beta-coalescent with 1 < a < 2 as 
some kind of interpolation between Kingman's coalescent and the 
Bolthausen-Sznitman coalescent. In fact, we have: 

Theorem 3.4. Let (nj"\t > 0) denote a Beta-coalescent with 1 < 
a < 2. Then as a ^ 2 from below, we have: 

(nj"\t >o) (nt,t >o), 

where H is Kingman's coalescent, and — >d stands for convergence 
in distribution in the Skorokhod space D(]R+,P). 

An illustration of this result is given in Figure 7, which was gen- 
erated by Emilia Huerta-Sanchez, whom I thank very much for al- 
lowing me to use this picture. 

3.1.5 Coming down from infinity 

Fix a finite measure A on [0, 1], and consider a A-coalescent (lit, t ^ 
0). One of the first things we saw for Kingman's coalescent is that it 
comes down from infinity, meaning that almost surely after any pos- 
itive amount of time, the total number of blocks has been reduced 
to a finite number (Theorem 2.1). Given that in Kingman's coales- 
cent only binary mergers are possible, and that here we may have 
many more more blocks merging at once, one may naively think that 
this should be also true for every A-coalescent. In fact, such is not 
the case, and whether or not a given A-coalescent comes down from 
infinity depends on the measure A. This is part of a more general 
paradox, which we will describe in more details later, that Kingman's 
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Figure 7: The Beta-coalescent for two different values of the pa- 
rameter a: top, a = 1.2; bottom a = 1.9. Courtesy of Emiha 
Huerta-Sanchez. 
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coalescent is actually the one in which coalescence is strongest (see 
Corollary 4.2). 

It is easy to see that there exists some A-coalescents which do not 
come down from infinity. Indeed, for some measures A, lit has a 
positive fraction of dust for every t > almost surely (and hence an 
infinite number of blocks): 

Theorem 3.5. Let D be the event that for every t > 0, Ht has some 
singletons. Then ¥{D) = 1 if and only if 



In the opposite case, ¥{D) = 0. 

Proof, (sketch) If this integral is finite, then the rate at which a 
given block takes part in a merger is finite, and so after any given 
amount of time, there remains a positive fraction of singletons that 
have never taken part in a merger. The converse uses a zero-one 
law for n along the lines of Blumenthal's zero-one law (details in 



For instance, if a < 1, then the Beta-coalescent has a positive 
fraction of singletons at all times, while this fails if a > 1. In par- 
ticular, the Bolthausen-Sznitman coalescent does not have any dust. 
We will see below that the Bolthausen-Sznitman coalescent (which, 
we remind the reader, corresponds to the case a = 1 of the Beta- 
coalescent) is a two-sided borderline case, in the sense that it does 
not come down from infinity but has no dust. However if a is larger 
than 1, then the corresponding coalescent comes down from infinity 
(Corollary 3.2), and if it is smaller then the coalescent has dust with 
probability 1. 

What are the conditions on A to ensure coming down from infinity? 
The first thing that is needed is to say that, if the number of blocks 
becomes finite, this can only happen instantly near time zero, except 
in the case of the star-shaped coalescent. 

Theorem 3.6. Assume that A({1}) = 0. Let E he the event that 
for every t > 0, 11^ has only finitely many blocks. Let F be the event 
that lit has infinitely many blocks for all t > 0. Then 




(64) 



[131]). 



□ 



¥{E) = 1 or ¥{F) = 1. 



Coalescent theory 



76 



If ¥{E) = 1 we say that the coalescent comes down from infinity, 
and if¥{F) = 1 we say that the process stays infinite. 

Clearly, the assumption A({1}) = is essential, or otherwise we 
get a coalescence of all blocks in finite positive time. This possibil- 
ity being taken away, the argument is a (fairly simple) application 
of the zero-one law mentioned above together with the consistency 
property. See [131] for details. 

[131] left open the question of finding a practical criterion for de- 
ciding whether a given A-coalescent comes down from infinity. The 
first answer came from the PhD thesis of Jason Schweinsberg, who 
proved a necessary and sufficient condition for this. We describe his 
result now. Given a finite measure A, let Xh^k denote the rate (53) 
and let 

^^ = E(t)V- (65) 

k=2 ^ ^ 

Note that Ab is the total coalescence rate when there are b blocks. It 
turns out that the relevant quantity is the number 

lb = Y.{k-l)(^^Xb,k. (66) 

k=2 ^ ^ 

To explain the relevance of this quantity, note that if there are cur- 
rently Nt = b blocks, then after dt units of time 

E{Nt+dt\Nt = b) = b - -fbdt (67) 

since if a fc-tuple of blocks merges, then this corresponds to a decrease 
Nthy (k—l). Define a function 7 : M+ —> M by putting 7(6) := 7[fej 
for all b G M+. Following the differential equation heuristics (23) al- 
ready used for Kingman's coalescent, we see that if u{t) = E(A'j), 
from (67) we expect u{t) to approximately solve the differential equa- 
tion: 

(u'{t) =-7(n(t)); 

\n(0) =00. ■ ' 

Forgetting about problems such as discontinuities of 7 and rigour in 
general, we get by solving formally the differential equation (68): 

"'("^^ ds = -t (69) 



7(.u{s)) 
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so that making the change of variable x = u{s), 

dx 

/ ^ = (70) 

Ju(t) 

We see hence that u{t) is finite if and only if J°° < oo. Remem- 
bering that 7(3;) = 7|^^j leads us to Schweinsberg's criterion [142]: 

Theorem 3.7. Let A he a finite measure on [0,1]. The associated 
A-coalescent comes down from infinity if and only if 

00 

Y.%'<oo. (71) 

b=2 

Proof, (sketch) We will sketch the important steps that lead to The- 
orem 3.7. The first one is to define T„ which is the time it takes to 
coalesce all n first integers. Then we have naturally, = Ti < T2 < 
• • • < Tn, and note that the coalescent comes down from infinity if 
and only if Too := hiUn^oo Tk < 00 almost surely. 

Assume that (71) holds. Fix n > 1 and consider the restriction 
of n to [n]. Let Rq = 0, and define Ri to be sequence of times at 
which n" loses at least one block, and if there is only one block left 
then define Ri = Ri-i- Thus Rn-i = 7n as after n — 1 coalescences 
we are sure to be done. Thus if Li = Ri — Ri^i we have E(T„) = 
'^7=i^(^i)- Now, conditioning upon Ni^i := A'^7'^_-^ the number of 
blocks at time Tj_i, we see that Li is exponentially distributed with 
rate so long as Ni^i > 1. Thus 

n-1 

E(TO = ^E(A^^_i^l|;v._,>i}). (72) 

i=l 

Observe that, if J, = Ni-i — Ni is the decrease of N at this collision, 
we have 

r(,, = .-i|iv._. = .) = Q^ 

and thus E(Jj|A'^j_i = b) = jt/^b- Plugging this into (72) yields: 

n-1 

E{Tn) = Y,niNl,nJi\Ni-i)) 

i=l 
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since when Ni-i = 1, Jj = anyway. It follows that 

n-l 

E{T^)=E{Y,lNLJ^)■ (73) 

i=l 

Looking at the random variable in the expectation of the right-hand 
side in (73), X = ^^^i lNl_^Jii we see that, intuitively speaking this 

random variable is very close to X]b=2 7b~^' '^'^v^Li ^^^^ repeated 
exactly Jj times. Thus if Jj isn't too big and if 7^ doesn't have too 
wild a behaviour, it is easy to understand how this yields the desired 
result. For instance, we get an easy upper-bound by monotonicity: 
some simple convexity arguments show that 7;, is nondecreasing with 
h, and hence 

Thus under the assumption (71), we get by the monotone conver- 
gence theorem E(roo) < c>o and thus the coalescent comes down from 
infinity. 

The other direction is a little more delicate, and the main thing to 
be proved is that if the coalescent comes down from infinity, i.e., if 
Too < 00, then in fact this random variable must have finite expec- 
tation. Granted that, a dyadic argument applied to (73) does the 
trick. Thus we content ourselves with verifying: 

Lemma 3.1. The coalescent comes down form infinity if and only 
ifE{T^)<oo. 

Proof. Let be the event that > T^-i, that is, at time T^-i, 
n™ still has two blocks. Then the expected time it takes for these 
two blocks to coalesce is just A2 2 ='■ P- Thus 

00 00 
E(Too) = E(T„, - r^_i) = pY, nAm). 

m=2 m=2 

Hence, assuming E(Too) = 00, we get Ylm=2^iArn) = 00. An ap- 
plication of the martingale version of the Borel-Cantelli lemma then 
shows that Am occurs infinitely often almost surely. When this is so, 
Too is greater than an infinite number of i.i.d. nonzero exponential 
random variables, and hence Too = 00 almost surely. This finishes 
the proof of Lemma 3.1. □ 
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The reader is referred to the original paper [142] for more details 



As an application of this criterion, it is easy to conclude: 

Corollary 3.2. Let < a < 2. The Beta- coalescent with parameter 
a comes down from infinity if and only if a > 1. In particular, the 
Bolthausen-Sznitman coalescent does not come down from infinity. 

We will see later that another (equivalent) criterion for coming 
down from infinity is that 



where il^{q) = Jq {e~'^^ — 1 + qx)x~'^ K{dx) is the Laplace exponent 
of a certain Levy process. As we will see, this criterion is related to 
critical properties of continuous-state branching processes. There is 
in fact a strong connection between A-coalescents and these branch- 
ing processes; this connection will be explored in the next section, 
and hence this will give a different proof of Theorem 3.7. Along 
the way, we will be able to make rigorous the limit theorem which 
is suggested by the heuristic approach outlined before this theorem: 
that is, for small times t > 0: 



3.2 A Hitchhiker's guide to the genealogy 

This section is devoted to the study of a few simple models where the 
genealogy is well-approximated by A-coalescents. There are a num- 
ber of models where such convergence is discussed. For instance, 
Sagitov [139] gave a simple model which is closely related in spirit 
to the first one we will be studying. (Remarkably, that paper was 
published simultaneously to that of Pitman [131] and, although in- 
dependent, it also contained a definition of A-coalescents, so that 
both Pitman and Sagitov share the credit for the discovery of this 
process). We have chosen to discuss two main models. These are: 



about the rest of the proof. 



□ 





(74) 



1. a Galton- Watson model due to Schweinsberg [144] where the 
offspring distribution is allowed to have heavy tails, 
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2. A model with selection and recombination (also known as hitch- 
hiking), studied by Durrett and Schweinsberg [68]. 

We also note that recently, Eldon and Wakeley [70] came up with 
a model which illustrates further the impact of offspring variability 
and gives rise to A-coalescents for the genealogies. Some biological 
and statistical implications of these findings are discussed in [70] and 
[71]. 

3.2.1 A Galton- Watson model 

We now describe the population model that we will work with in 
this section. This is a model derived from the well-known Galton- 
Watson branching process, but, unlike these processes, the popula- 
tion size is kept constant by a sampling mechanism: we assume that 
the offspring distribution of an individual has mean 1 < < co, 
so that by the law of large numbers, if there are N individuals in 
the population at some time t, then the next generation consists of 
approximately N/j, > A'" individuals. Instead of keeping all those 
Nfi individuals alive, we declare that only N of them survive, and 
they are chosen at random among the Nfi individuals of that genera- 
tion. Thus the population size is constant equal to N. Formally, the 
model is defined generation by generation, in terms of i.i.d. offsprings 
Xi , . . . , X]\f (where the distribution of X allows for heavy tails and 
is specified later), and from random variables (ui)^-^ which are ex- 
changeable and have the property that = N. The variable 
corresponds to the actual offspring number of individual i after the 
selection step. Note that this population model may be extended to 
a bi-infinite set of times Z by using i.i.d. copies {{i^i{t))^i,t G Z}: 
thus this model belongs to the class of Cannings populations models 
discussed in Theorem 2.5. 

Having defined this population dynamics, we consider as usual the 
coalescing process obtained by sampling n < N individuals from the 
population at time 0, and considering their ancestral lineages: that 
is, let (n"'^, t = 0, 1, . . .) be the "Pn-valued process defined by putting 
z ~ J if and only if individuals i and j share the same ancestor at 
generation —t. This is the by-now familiar ancestral partition. We 
now specify the kind of offspring distribution we have in mind for 
the Galton- Watson process, which allows for heavy-tails. We assume 
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that there exists a > 1 and C > such that for all x > 0: 



F{X > x) 



Cx 



-a 



(75) 



One can also think of the case 



P(X = x) 



C'x- 



(76) 



although (75) is a slightly weaker assumption and so we prefer to 
work with it. When a > 1, /x := E(X) < oo and we further assume 
that E(X) > 1, so that the underlying Gallon- Watson mechanism is 
sueprcritical. Recall that in Cannings models, the correct time scale 
is given by the inverse coalescence probability c^^, where: 



As was already discussed, cn is the probability that two randomly 
sampled without replacement at random from generation have the 
same parent at generation —1, and thus it is the probability of coa- 
lescence of any two lineages. Schweinsberg's result states that there 
is a phase transition at q = 2 for the behaviour of the genealogies. 

Theorem 3.8. Assume (75) and fi > 1. For any n > 1, as N ^ oo: 

1. Ifa>2, the genealogy converges to Kingman's coalescent. 

2. If 1 < a < 2, the genealogy converges a Beta- coalescent with 
parameter a. 

As usual, the formal statement which is contained in the informal 
wording of the theorem is that, in the case a > 2, (n"y^ , t > 0), con- 
verges to Kingman's n-coalescent for every n > 1, while it converges 
to the restriction of a Beta-coalescent to [n] if a G [1, 2). 

Remark 3.1. Note that a = 2 is precisely the critical value which 
delimitates the convergence of the rescaled random walks {Sn ■= 
^"^-^ Xi) towards a Brownian motion or a Levy process with jumps. 
As we will see in the next chapter and in the appendix, this is not a 
coincidence: Galton- Watson trees can he described in terms of pro- 
cesses known as height functions, or contour processes, which are 
close relative of random walks with step distribution X. If this step 
distribution is in the domain of attraction of a normal random vari- 
able, we thus expect a tree which is close to the Brownian continuous 




(77) 
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random tree, for which the genealogy is closely related to Kingman's 
coalescent, as is proved in [16] and will be shown later in these notes. 
On the contrary, if the step distribution is in the domain of a stable 
random variable with index 1 < a < 2, then the limiting tree is called 
the stable continuum random tree and its genealogy is known to be 
given by Beta-coalescents {[20]), as will be discussed in more details 
later on. 

Remark 3.2. When a < 1, the coalescent obtained from the ances- 
tral partitions converges to a coalescent with simultaneous multiple 
collisions. As we do not enter in the detail of these processes in these 
notes, we only refer the reader to part (d) of Theorem 4 in [M4]- 

Proof. We will go over a few of the important steps of the proof 
of Theorem 3.8, leaving as usual the more difficult details for the 
interested reader to find out in the original paper [144]. 

Case 1. Let a > 2. The main idea is to use Mohle's lemma 
(Theorem 2.5). Thus it suffices to check that (31) holds. Recall that 
this condition states that 



where Xi is the offspring number of individual 1 before selection, 
and Sn = Xi + . . . + Xjy. Now, it is easy to see, when a > 2, that 
CN = E(Xi(Xi - I)/ 3%) ~ c/N for some c> 0. Thus (78) reduces 
to showing that 





(78) 




However 



Xi(Xi-l)(Xi-2) 




< 




max(Xf,iV3) 



and thus 



E 



Xi(Xi-l)(Xi-2) 



1{5.>^} < j;^P(X = A;)+P(X>iV) 



k=Q 
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Multiplying by N"^ and using (75) it is easy to see that this converges 
to 0, and hence the condition in Mohle's lemma (31) is verified. 

Case 2. Assume that 1 < a < 2. There are two steps to ver- 
ify. The first one is to compute the asymptotics of cat, the scale 
parameter. 

Lemma 3.2. We have the asymptotics, as N ^ oo: 

CN ~ CiV^-°a^-"S(2 - a, a) (79) 
where B{2 — a, a) := r(a)r(2 — a) . 
Proof, (sketch) Note that 



CN ^ii^ \^ S% ^{•S'n>iV} J 



Write Sn = Xi + 5^, where S'j^ = X2 + ■ ■ ■ X^. By the law of large 
numbers, S'^ « A^^, so 



CAT E 



[X{X-l) 



V(A + M)2_ 

with M = N jjL. Thus (79) follows from the statement 

lim M^-^E i ^y^ = CaB(a, 2 - a). (80) 

This is purely a statement about the distribution of X, which is 
shown by tedious but elementary manipulations. □ 

The second ingredient of the proof is to show a limit theorem for 
the probability that there is a p-merger for some < p < 1 at a 
given generation. Note that this is essentially the same as asking 
that Xi/Sn > P- 



Lemma 3.3. 

lim I:Lp(^>p) = -^ ^ [\'-{l-yr-'y-'dy. (81) 

N^oo ci\f otv n[2 — a,a) j„ 



Proof, (sketch) To explain how this comes about, we follow the same 
heuristic as above, and write: 

Xi Xi 

Sn^ Xi+ Nfi 
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^ > p ) « F ( — > p 



so that: 



Sn~ J yXi + Nfi 

Xi > -^fiN 
1-p 

Using the assumption (75), we deduce, using Lemma 3.2: 

N /Xi \ 1 
— IP {-Er->P 



cn \Sn J aB{2 — a,a) \ p 

Using the substitution z = {l — y)/y in the integral of (81), the right- 
hand side can be rewritten as a Beta-integral, so this is precisely what 
was requested. □ 

The last lemma shows that the infinitesimal rate of a p-merger 
is, for any < p < 1, approximately what it would be if this was 
a Beta-coalescent. From there, is not hard to conclude to the case 
2 of Theorem 3.8 (the i.i.d. structure of the generations gives the 
asymptotic Markov property of the coalescent). Thus the proof is 
complete. □ 

Remark 3.3. Theorem 3.8 should be compared to the earlier paper 
of Sagitov [139]: in that paper, general Cannings model are consid- 
ered and it is shown that the genealogy could converge to any A- 
coalescent under the appropriate assumptions. (This is what led him 
to define A-coalescents in the first place, as opposed to the more 
"abstract" route based on consistency and exchangeability which was 
followed by Pitman and in these notes). His main assumption is 
that N'^a'^{N)F{i^i > Nx) J^y~'^A{dy), together with some addi- 
tional moment assumptions. The model of Theorem 3.8 is of course 
a particular case of the Cannings model, however checking this main 
assumption is where all the work lies. 



3.2.2 Selective sweeps 

In this section we describe the effect of a phenomenon called selective 
sweeps on the genealogy of a population. As usual we will start by 
explaining what we are trying to model (that will lead us to the 
notion of recombination, hitchhiking and selective sweeps, all these 
being fundamental concepts in population genetics) and then explain 
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the mathematical model and associated results, which are due to 
Durrett and Schweinsberg [68]. 

This model is our first which doesn't ignore selection. When a 
favourable allele is generated through mutation, it quickly spreads 
out to the whole population (this is easy to see with variations of 
the Moran model: suppose at each step we have a higher chance to 
kill an A individual than an a individual: such a selective advantage 
quickly drives the A population out with positive probability). When 
we look at the ancestral lineages, what happens is that all lineages 
quickly coalesce into one, which is the lineage corresponding to the 
individual that got the mutation. Thus we have approximately a 
star coalescent at this time, which isn't so interesting. However, 
some interesting things occur when we look at another location of 
the genome (one says locus). The reason why this is interesting is 
that there are some nontrivial correlations between the genotypes of 
an individual at different locations. 

The main mechanism which gives rise to this correlation is called 
recombination. This is a type of mutation which rearranges large 
portions of one's genetic material: more precisely, it causes two ho- 
mologous chromosomes to exchange genetic material. As a result, 
a chromosome that is transmitted to a recombinant's offspring con- 
tains genetic material both from the mother and the father (whereas 
normally, it is only that of one of the two parents). Recombination 
is a truly fundamental process of life, as it guarantees a mixing of 
the genetic material. 

Suppose a selective sweep occurred at some locus a, where the 
allele a, being favourable, drove out the resident A population, and 
consider a different locus (3 along the same chromosome, this one 
being selectively neutral. Now, in a sample of the population, after 
the sweep, most people descend from the initial mutant that got the 
favourable a allele at locus a. On the face of it, one would thus 
expect that at locus /3, everybody should get the same allele as the 
one that this individual had at locus /3 (say b). However, because 
of recombination, some individuals got their genetic material at the 
locus /3 from individuals which may not have been a descendant from 
the original mutant. As a consequence, a fraction of individuals "es- 
cape" the selective sweep. Let Q be the random ancestral partition, 
which (as usual) tells us which individuals from the sample of size n 
have the same ancestors at the time of the advantageous mutation. 
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Then this elementary reasoning shows that this random partition is 
hkely to be "close" to the partition Kp which defines p-mergers: that 
is, with one nontrivial block B which contains a positive fraction p of 
all integers, selected by independent coin-tossing (in our description, 
1 — p is the probability to escape the sweep) . Of course, this demands 
some care, as there can be several sources of error in this reasoning 
(for instance, once a lineage escapes the sweep by recombination, 
the parent of the recombinant could himself be a descendant of the 
initial mutant, or individuals who escape the sweep may coalesce 
together - however, all those things are unlikely if the sweep occurs 
rapidly compared to the time scale of Kingman's coalescent). 

The fact that different loci are not independent is called Linkage 
des equilibrium so we have seen how recombination is a (main) con- 
tributor of this desequilibrium. That a selectively neutral allele can 
quickly invade a large part of the population due to linkage desequi- 
librium (for instance through recombination with a favourable allele) 
is known as Genetic Hitchhiking . I believe that the first rigorous in- 
vestigation of this phenomenon goes back to Maynard-Smith and 
Haigh [118] in 1974, which was another cornerstone of theoretical 
population genetics. 

Model. It is time to define a model and state a first theorem. 
First, let < s < 1 be the selective advantage of the allele a: we work 
with a Moran model with selection, which says that every time an a is 
replaced by an A individual, this change is rejected with probability 
s. Let < r < 1 be the recombination probability at locus (3: in our 
setting, this means that when a new individual is born, it adopts 
the genetic material of his parent at both loci most of the time, 
but with probability r, the allele at locus (3 comes from a different 
parent who is selected uniformly at random in the population (this 
is because we are treating the two parents as two separate members 
of the population). 

Theorem 3.9. (Durrett and Schweinsberg [67], [68]) 
Let p = exp{—r{logN)/s). Assume that there exists Ci such that 
r < Ci/logN. Then there exists C > such that, conditionally 
given that allele a eventually invaded the whole population: 




(82) 
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Here dxY denote the total variation distance between the law of 
the random partition Q and that of the p- merger partition Kp: 

dTv{Q,Kp)= sup |P(e=7r)-F(Kp = 7r)|. 

We note that martingale arguments imply that the probability that 
allele a eventually invades the whole population (and thus that a 
selective sweep occurs) is 

s 

l-(l-s)^' 

which is approximately s if s is large compared to or approxi- 

mately if s is smaller. 

While Theorem 3.9 tells us what the genealogies look like between 
the beginning of the selective sweep and its end, in reality that is not 
what we care about: we do not simply wish to trace ancestral lineages 
back to the most recent selective sweep, but we wish to describe the 
entire genealogical tree of the sample of the population we are looking 
at. In that case, it is more likely that the genealogy has been affected 
by a series of selective sweeps that have occurred at various portions 
of the genome. We still assume that the locus we are considering 
is neutral, but study the combined effects of recombination after a 
series of selective sweeps. Of course, we cannot expect the selective 
advantage s and recombination probability r to be the same during 
all those events: this depends on the type of mutation, but also 
on the position of this advantageous mutation with respect to the 
observed locus: the further away this advantageous mutation occurs, 
the smaller the recombination probability r. This led Gillepsie [89] 
to propose the following: 

Model. We run the usual Moran model dynamics. In addition, 
the chromosome is identified with the interval / = [— L, L] and we 
observe the locus at position x = 0. Mutations occur as a point 
process 

i>l 

on the state space [0,oo) x [—L,L] x (0,1). The first coordinate 
stands for time, the second for the position on the chromosome, 
and the third coordinate s is the selective advantage associated with 
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this mutation. We assume that is a Poisson point process, whose 
intensity measure K is given by: 



where the measure fi{dx, ds) governs the rate of beneficial mutations 
with selective advantage s occurring at position x. We also assume 
given a function r : [—L,L] (0,1), which tells us what is the 
recombination probability r when there is a mutation at position 
X along the chromosome. The function r that we have in mind is 
something like r{x) = r\x\ (i.e., the recombination probability is 
proportional to the distance), but we will simply assume that: 



2. r is decreasing on [— L,0] and increasing on [0, L]. 

In general, we will work with a Poisson point process V = Vn where 
the subscript indicates a possible dependence on the total popula- 
tion size N , and will do so consistently throughout the rest of this 
section. Strictly speaking, one must also specify if a selective sweep 
starts when a previous one hasn't already been completed. Here we 
will simply reject this possibility (in the regime we will study, this 
possibility is too infrequent anyway). 

We may now state Durrett and Schweinsberg's key approximation 
for this model (Theorem 2.2 in [68]): 

Theorem 3.10. Assume that the functions is such that (log N)r]\f 
converges uniformly towards a function R : [—L, L] (0, oo) satis- 
fying (i) and (ii). Suppose also that Nfijy converges weakly to a 
measure /i. Then the genealogies, sped up by a factor of N, con- 
verge (for finite- dimensional distributions) to a A-coalescent, where 
A = (5o + x'^7]{dx), where 




The term "s" in the integrand corresponds to requiring that the 
sweep is successful, and the other term comes directly from Theorem 
3.9. Note that (as noted in [68]) the finite-dimensional distribution 
convergence may not be strengthened to a Skorkohod-type conver- 
gence, as there are in reality several transitions occurring "simulta- 
neously" when there is a single selective sweep. 



K{dt, dx, ds) = dt(Si n{dx, ds) 



1. r(0) = 0; 





(83) 
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To get some intuition for (83), it helps to consider a few examples. 
If all mutations have the same selective advantage s, and if r^ix) = 
r/logN for some fixed r > 0, with all mutations occurring at rate 
a/N for some a > 0, then the measure rj which appears in Theorem 
3.10 is a point mass at p = e~^^'^ of mass sa. 

If now fi{dx, ds) = adx®5s (that is, the selective advantage is still 
constant and the mutation rate is constant along the chromosome, 
with total rate aL/N), and if r is constant, then A = 5o + -^0 and Aq 
has density cy for e""^^^^ < y < 1 and otherwise, with c = e^"** 
In particular, as L ^ co (infinitely long chromosomes) this is the 
measure Ao((iy) = cydy for all < y < 1. 

Finally, note that any measure A which contains a unit mass at 
may arise in Theorem 3.10 (see example 2.5 in [68]). 

Comments 

1. The upper-bound in Theorem 3.9 is of size 1/logA^, which, in 
practice, is not that small. Durrett and Schweinsberg prove that 
a better approximation can be obtained by using a coalescent with 
simultaneous multiple collisions. 

2. Etheridge, Pfaffelhuber and Wakolbinger [74] independently (and 
simultaneously) obtained some equivalent approximations but using 
a quite different route. 

3.3 Some results by Bertoin and Le Gall 

In this section, we briefly go over some of the results proved by 
Bertoin and Le Gall in their papers [30] and [31]. This section is 
intended to give a bird's eye view on this part of their work, which 
would take more time to cover properly. This section does not cover 
the work of [29] and [32] (the former will be discussed towards the end 
of the notes in connection with the Bothausen-Sznitman coalescent, 
while the latter will be discussed in the next section with the fine 
asymptotics of A-coalescents). 

The first observation of Bertoin and Le Gall is that any A-coalescent 
process may be realised as a stochastic flow in the classical sense of 
Harris [98]. The state space of the flow is the so-called space of 
bridges, that is, cadlag nondecreasing functions going from to 1 on 



Coalescent theory 



90 



the interval (0, 1): 

B = {f : [0, 1] [0, 1] cadlag nondecreasing ; 

/(O) = and /(I) = 1}. (84) 

This point of view ahows to define a measure- valued process called 
the (generalized) Fleming- Viot process, which is a neat way of gener- 
alising the notion of duality to A-coalescents. The stochastic differ- 
ential equations which describe the generalized Fleming- Viot process 
(i.e., the equivalent of the Wright-Fisher diffusion in this context) is 
then studied, and finally this is used to come back to Kingman's 
coalescent and show a surprising connection to a coalescing flow of 
particles on the circle. We will follow a somewhat different order of 
presentation, starting with Fleming- Viot processes, partly because 
of their importance in what follows. 

3.3.1 Fleming- Viot processes 

The idea behind Fleming- Viot process is quite simple, but unfor- 
tunately things often look messy when written down. We will try 
to stay as informal as possible, following for instance Etheridge's 
excellent discussion of the subject [72]. 

Suppose we consider the population dynamics given by the Moran 
model, with a total population size equal to A^, run for an undeter- 
mined but finite amount of time. Suppose that in addition to the 
data of the ancestral lineages, we add another information, which 
is the allelic type carried by each individual. Of course, this de- 
pends on the initial allelic type of every individual in the popula- 
tion at the beginning of times. To make things simple, we imagine 
that, initially, all the individuals carry different types. We la- 
bel them, e.g., Ai{0), A2{0), . . . ,A]y{0), and note that it absolutely 
doesn't matter what is the state space of the variables Ai{0). For 
the sake of convenience we choose them to be independent random 
variables Ui, . . . ,Un, uniformly distributed on (0,1). These types 
are then transferred to the offsprings of individuals according to the 
population dynamics, and this gives us for every time t a collec- 
tion {Ai{t)}^^. Each of the Ai[t) is thus an element of the initial 
collection Ai{0), . . . , An{0)- 

Consider the distribution of allelic types at some time t > 0: 
how does this look like? The first thing to note is that the vector 
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{Ai{t), A2{t), . . . , AN(t)) is exchangeable. This suggests considering 
the measure 



and taking a hmit as — > oo. Indeed, if we speed up time by a 
factor A^, nothing prevents us from defining directly a sequence of 
labels {Ai{t)}'^-^ for all i > which has the following dynamics: 

1. Initially Ai{0) = f/j is a collection of i.i.d. uniform random 
variables Ui on (0, 1). 

2. At rate 1, for each i < j, Aj(t) becomes equal to Ai{t). 

Note then that the sequence Ai{t) is an infinite exchangeable se- 
quence, so we can apply De Finetti's theorem, which tells us that 
for each fixed t > 0, the empirical distribution of labels hn defined 
in (85) has a weak limit almost surely. In fact, the next result gives 
a stronger statement. To state it we need the following notations: if 
n > 1, and f{xi, . . . ,Xn) is any continuous function on [0, 1]", one 
may define a function F on measures // on [0,1] by saying 



The function F may be interpreted as follows: given a measure fx 
on [0,1], sample n points (xi,...,rE„) distributed according to fi 
and evaluate /(xi, . . . , a;„). The expectation of this random variable 
(conditionally given fj.) is equal to F{fj.). Further, if x = (xi, . . . , x„) € 
M", let x^'^ denote the element x' £ M" with all coordinates x'^ = x^ 
except x'j = Xi. That is, it is x but where Xj is replaced with Xj. 

Theorem 3.11. As N oo, {fj,i\f{t),t > 0) converges almost surely 
towards a measure-valued strong Markov process (/it,t > 0), called 
the Fleming-Viot diffusion. Initially, (jlq is the uniform measure on 
(0,1), but for any fixed t > 0, the measure fj-t consists exactly of 
finitely many atoms. Moreover, it has a generator L defined by the 
following property: if F is a function of the form (86), then 



LFif,) =/.../ Yl (^(^''') - • • • I'idxn). (87) 




(85) 



1=1 




(86) 




l<i<J<n 



This property characterises the Fleming-Viot process {nt,t ^ 0). 
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The almost sure convergence referred to in this theorem corre- 
sponds to the topology on measure- valued functions defined by say- 
ing Uti^) — > m(^) foi' Borel sets A, uniformly on compact sets in 
(0,oo). 

See, for instance (1.49) in [72] for the form of the generator of the 
Fleming- Viot process. (The construction which we have used here 
is closer in spirit to the "almost sure construction" of the Chapter 5 
in [72] and the Donnelly-Kurtz lookdown process - more about that 
later, see Definition 4.4). The fact that fit consists of only finitely 
many atoms for every t > is in fact the same phenomenon that 
Kingman's coalescent comes down from infinity. 

We note that there exists numerous versions of Fleming- Viot pro- 
cesses. The version which we have considered here is the simplest 
possible: for instance there are no mutations in this description. In- 
corporating small mutations (with allelic type given by an element of 
the integer lattice Z*^) leads (in the limit of small mutation steps) to 
the spatial Fleming- Viot process which is related to super Brownian 
motion (more about this later, too). 

The generalisation which was considered by Bertoin and Le Gall 
in [30] for A-coalescents was the following. Let A be a fixed measure 
on (0,1) (without any atom at zero for simplicity). Consider a popu- 
lation model with infinitely many individuals 1,2,..., whose initial 
allelic types are, as above, i.i.d. uniform random variables Ui, . . .. 
Let {pi,ti) be a Poisson point process 

i 

with intensity p~'^A{dp) (8> dt, as in the Poissonian construction of 
Theorem 3.2. The model is defined by saying that at each time ti 
such that {pi,ti) is a point of "P, we selected a proportion pi of levels, 
say Ii,l2,..., by independent coin-toss. Then the allelic types of 
individuals Ii,l2, ■ ■ ■ , are all modified and become equal to Aj-^ {t~ ) . 
An example of the evolution of allelic types at one such time is given 
in Figure 8. 

To see that this construction is well-defined, note that (as in the 
case of A-coalescents), the rate at which something happens in the 
first n levels of the population (i.e., among the first n individuals of 
this infinite population), is finite, and that the restrictions are con- 
sistent. Again, we can consider the empirical distribution of allelic 
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Figure 8: Example of evolution of allelic types in the A-Fleming-Viot 
process. The red crosses indicate which levels were selected by coin 
tossing. 

types at time t > 0: 



and consider limits as — > c«. For a fixed t > 0, it is easy to see that 
the sequence {Ai{t)}^^ is exchangeable: this is slightly counterintu- 
itive as it seems a priori that lower levels play a more important role 
than upper ones, but is nevertheless true and is a consequence of the 
fact that the initial type sequence is i.i.d. and therefore exchange- 
able. By De Finetti's theorem, the limit of /UAr(t) thus exists almost 
surely, and one has the following. If x = {xi, . . . € M"", and if 
/ C [n] = {1, . . . , n}, let denote the element x' G M", with coor- 
dinates equal to those of x, except that all the coordinates x'j,j G / 
have been replaced with Xi and i = inf /. 

Theorem 3.12. As N ^ oo, (/XAr(t),t > 0) converges almost surely 
towards a measure-valued strong Markov process {f^t,t ^ 0), called 
the generalised Fleming- Viot or A-Fleming- Viot process. The gener- 
ator L is defined by the following property: if F is a function of the 
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form (86), then 

LF{^L)=j...l Yl K,\I\{f{x')-f{x))fi{dxi)...fl{dXr. 

IC[n],\I\>2 

where Xn,k = ^^ '^{^ ~ x)^~^K{dx) is the coalescence rate of any 
k-tuple of blocks among n in a K-coalescent. The property (88) char- 
acterises the A-Fleming-Viot process {nt,t ^ 0). 

The form of the generator and the fact that the martingale prob- 
lem is well-posed can be seen from Section 5.2 in [30], and essentially 
boils down to the duality which we now discuss. It is more or less ob- 
vious that there is a relation of duality with the A-coalescent, which 
arises when time is running backwards (as usual!) The basic reason 
for this is that the time-reversal of a Poisson point process with a 
certain intensity dv®dt is also a Poisson point process with the same 
intensity. Here is the corresponding statement. To state it, we start 
with a number n > 1 and a function /(xi, . . . ,x„) on [0, 1]". Let 
TT G Vn be a partition of [n], and assume that vr has k blocks. Then 
for all X E M'^, we may define x'^ to be the element of x' G M" such 
that for all 1 < i < A;, and for all j G Bi (the i*^ block of vr), x'^ = Xi. 
In short, each of the k coordinates of x is copied along the blocks of 
TT to create an element of R". Define the functional 

Hl^, vr) = J ...J fi{dxi) . . . ^x{dxk)f{xn. (89) 

The duality relation states (see (18) in [30]): 

Theorem 3.13. Let denote the expectation for the A-Fleming- 
Viot process (^j,t > 0) and let E"" denote that for the A-coalescent 
(IIt,t > 0) restricted to [n]. Then we have, for all functions <I> of the 
form (89): 

E;;(ci.(/it,no)) = Er„(ci>(^o,nO) (90) 

where ttq is the trivial partition on [n] into singletons. 

We have already discussed how dual processes can be so useful (for 
instance, it is a crucial step in proving that the martingale problem 
is well-posed for the generalised Fleming- Viot process). In our case, 
we will see later that Fleming- Viot processes are an essential step to 
describe the connection of A-coalescents to continuous-state branch- 
ing processes and continuum random trees. Further, this will be 
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used below to describe natural stochastic differential equations and 
stochastic flows attached to A-coalescents. 

3.3.2 A stochastic flow of bridges 

We have already introduced the space of bridges B above, but the 
random bridges we will discuss will have the following extra property: 

Definition 3.4. A random bridge X is a random variable in B such 
that the increments of X are exchangeable. 

There is a natural operation on bridges, which is the composition 
o: if X and X' are two independent bridges, then so is X o X' . 

What is the connection between bridges and A-coalescents? This 
connection is simple to explain if we think about a Cannings model 
whose genealogy is approximately a A-coalescent. (Recall that Sag- 
itov [139] showed this is always possible - see Remark 3.3). Thus, 
let > 1, and let z/i, . . . , i/^r be the exchangeable vector giving the 
respective progeny of individuals 1, . . . , A^. Then X^i^i — ^ ■• 
what we can see is that the vector (z^i, . . . , u^) encodes a discrete 
random bridge: more precisely, define a function 

A:{0,...,Ar}^{0,...,Ar} 

such that if < J < A^: 

m = jZ^i- (91) 

i=l 

Thus A(j) /N is the fraction of the population at the next generation 
which comes from the first j individuals. (This interpretation will 
be crucial for what comes after). Note that A is a discrete bridge in 
the sense of Definition 3.4: it goes from to A^ between times and 
A^, and has exchangeable increments. 

Now, introduce the time-dynamics of the Cannings model: thus 
we have an i.i.d. collection of exchangeable vectors 

{z.i(t)}iIi,tGZ. 

To this we can associate a discrete bridge At for each t G Z as in (91). 
Note the following property: consider two times s < i S Z. Then 
for each < j < A^, the fraction of individuals in the population at 
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time t, that comes from the first j individuals of the population at 
time s, is precisely: 

^At_io...oA,(i) (92) 

Thus let us define, for every s < t in Z, the bridge B^^ as the linear 
interpolation on [0, 1] of 

<,(x) = lAi_io...oA,(j) (93) 

a X = j/N. Bertoin and Le Gall call the collection of random vari- 
ables {B^^f.)s<tez a discrete stochastic flow of bridges because: 

1. Bs s is the identity map. 

2. Bt^u ° Bs^t = Bs^u for all s < t < ti G Z (the cocycle property) 

3. Bg^t is stationary: its law depends only on t — s. 

4. Bg^t has independent increments: for every si < S2 • • • , G Z, 
then the bridges i?si,s2i • • • i -B^n-i.sn are independent. 

(Note however that [30] and [31] take for their definition i?*'* what 
we call here B-t,-s)- When N ^ oo and time is sped up by a certain 
factor Cat (the one which guarantees convergence of the genealogies 
to a A-coalescent process), it is to be expected that the flow 

(^i/c^,t/c^> -oo <s<t<co)^ {Bs^t, -oo < s < t < cx)) (94) 

with respect to some topology. {Bs^t, — oo < s < t < oo) is then a 
(continuous) flow of bridges because it satisfies properties 1-4 above, 
and furthermore in condition 1: 

in probability, in the sense of the Skorokhod topology. This is thus a 
condition of continuity. We can now state Theorem 1 in [30] , which 
states the correspondence between bridges and A-coalescents. For 
a random bridge B, let s G 5o be the tiling of (0,1) defined by the 
ranked sequences of jumps of B (where continuous parts are asso- 
ciated with the dust component sq). As usual, the correspondence 
arises by fixing a time (say i = 0) and running s backwards: 
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Theorem 3.14. Let {Bg^t-, —oo < s < t < oo) be the flow of bridges 
defined by (94)- Let S{t) be the tiling of (0,1) obtained from the 
bridge B-t,o, for all t > 0. Then {S{t),t > 0) has the same law as 
the ranked frequencies of a K- coalescent. Furthermore if Vi, . . . are 
i.i.d. uniform random variables on (0,1), we may define a partition 
lit by saying i ^ j if and only if Vi and Vj fall into the same jump 
of B^t,o- Then 

(nt,t > 0) is a A- coalescent. 

The proof outhned above was not the route used by Bertoin and 
Le Gall to prove this theorem, and it might be worthwhile to turn 
this outline into a more precise argument. 

3.3.3 Stochastic Differential Equations 

The points of view developed above (that is, the flow of bridges on 
the one hand, and the Fleming- Viot process on the other hand), al- 
low us to discuss analogues and generalisations of the Wright-Fisher 
stochastic differential equation, which was used to describe the pro- 
portion of individuals carrying a certain allele in a population whose 
genealogy is approximately Kingman's coalescent. 

To explain the first result in this direction, we first introduce what 
may be called microscopic bridges or infinitesimal bridges: the is 
the bridge which describes the effect of one "individual" at location 
X G [0, 1] having a progeny of size p G (0, 1). This bridge has the 
form: 

h{u) = b^^p{u) = n(l -p) +pl{u>x}- (95) 

Let (i?s,t)-oo<s<t<oo be the fiow of bridges constructed in (94) and 
let Ft = B()^f Thus Ft{y) is the fraction of individuals at time t 
descending from some individual in the interval [0, y] of the popula- 
tion at time (this is indeed the equivalent of the quantity which we 
track when we study the Wright-Fisher diffusion, with initial frac- 
tion of alleles a equal to y). When there is an atom {x,p) at some 
time —t, then what is the change in Ft{y)? This atom means that an 
individual located at x € [0, 1] is producing a macroscopic offspring, 
which represents a fraction p of the population right after. Thus by 
the composition property, letting F = Ft~{y) and F' = Ft{y), we 
see that F' = b o F = F{1 — p) when F < x, m. which case the 
infinitesimal increment is dF = —Fp. If on the other hand, F > x, 
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then we have F' = p + F{\—p), and thus the infinitesimal increment 
is dF = p[\ — F). Thus define the function 



ip{x,p,F) 



—pF a F < X 

p{l-F) iiF>x. 



If p~'^A(dp) is a finite measure, there are only a finite rate of events 
in the stochastic flow of bridges of the previous paragraph, and if we 
label these events {ti,Xi,pi), and let 



Pi) 



which is a Poisson point process with intensity dt Cg) dx ®p~'^h{dp)^ 
we get immediately that Ft = B^ t may be written as a stochastic 
integral: 

Ft{y) = y+ I M{ds,dx,dp)^P{x,p,F,-{y)). (96) 

J[0,i]x[0,l]x[0,l] 

It turns out that this stochastic integral still makes sense even if we 
don't assume that x~'^A[dx) is a finite measure. This is, as usual, 
stated as a result of weak existence and uniqueness: any filtered prob- 
ability space with a measure Poisson point process M with intensity 
dt (g) dx ®p~'^K{dp) and cadlag process Xt = {Xt{y), y G [0, 1]), such 
that Xt{y) satisfies the stochastic differential equation (96) almost 
surely for all y G [0, 1], is called a weak solution of (96). 

Proposition 3.1. (Theorem 2 in [31]) There exists a weak solution 
to (96), with the additional property that a.s. for every t > 0, Xt is 
a nondecreasing function on [0, 1] . Furthermore, if X is any solu- 
tion to (96), then (Xtiyi), . . . , Xt{yp)) has the same distribution as 
the p-point motion (Fj(yi), . . . ,Ft{yp)). In particular there is weak 
uniqueness. 

There is an associated martingale problem, which may be formu- 
lated as follows: given an atom {x,p), we define an operator on 
functions g : [0, 1]^ M by: 

^x,pg{y) = g{y + ^p{x,p,y)) - g{y) - ^i^{x,p,yi)^{y) 

1=1 ^* 
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Then for every yi, . . . , yp, the p-point motion Ft{yi), . . . , Ft{yp) sat- 
isfies 

g{Ft{yi),...,Ft{yp))- [ Cg{Fs{yi), . . . , Fs{yp))ds 

Jo 

is a martingale, where 

Cg{y) = [ dx [ p~^A{dp)A^,pg{y). 
Jo Jo 

This is well-defined as for any g £ C^, hy Taylor expansion one gets 
^x,pg{y) ^ Cp^ for some C which does not depend on p (or on x). 

3.3.4 Coalescing Brownian motions 

We end this statement with a surprising result of Bertoin and Le 
Gall, which links Kingman's coalescent to a certain flow of coalescing 
Brownian motions on the circle. 

To this end, define an operator T on functions defined on the 
torus T = M/Z: 

Tg{yi,...,yp) = ^ Yl ^^^^'^^'^5^^^^^ ^^^^ 
where the covariance function h{y,y') satisfies: 

h{y,y') = ^-\d{y,y'){l-d{y,y')). (98) 

The generator T defines a martingale problem, and we note that 
if X is a solution to this martingale problem (that is, if g{Xt) — 
Jq Tg{Xs)ds is a martingale for every g € C^(T)), then each of the p 
particles follows individually a Brownian motion on the torus with 
diffusion coefficient y^l/12. However, these Brownian motions are 
not independent, and are correlated in a certain way. In particu- 
lar, we will see that particles following this flow have the coalescing 
property: if XI = Xl for some time t > 0, this stays true ever after. 

Let X be a solution to the martingale problem defined by (97) with 
starting point Xq = (Vi,...,T4>) given by p independent uniform 
random points on the torus, and define a partition 11^ on Vp by 
putting i ~ j if and only if XI = Xf , i.e., particles i and j have 
coalesced. 
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Theorem 3.15. There is existence and uniqueness in law to the 
martingale problem defined by (97). For any solution X, the process 
(nt,t > 0) is Kingman's p-coalescent. 

Proof. The uniqueness part of the result is a consequence of the fact 
that the generator is smooth away from the diagonal (i.e., Xj 7^ Xj for 
i 7^ j) and of the fact that particles which hit each other coalesce, 
in the sense that they stay forever together. This can be seen as 
follows: consider for examples particles 1 and 2, and let T = inf{t > 
: Xl = X'l}. Fix also a z S T, and consider the process 



where if z,x S T, then (z, x) denotes the length of the counter- 
clockwise arc from z to x. This is a function of the trajecto- 
ries X^,Xf,...,Xf so long as neither X^ = z or Xf = z, so if 
T' = inf{t > : X} = z oi Xf = z}, and if g is any real function, 
we get a local martingale 



for every function g and any solution to (97). Thus on [0, T'], Yt 
is the diffusion on [-1,1] with generator 



for which zero is an absorbing boundary (this is, up to the sign, the 
same generator as in the Wright-Fisher diffusion, where the absorp- 
tion property is easy to see). This guarantees that Yt = for all 
t G [T,T'], and from this the coalescence property follows easily. 

To get the statement in the theorem which identifies the coales- 
cent process 11^ as Kingman's coalescent, the idea of the proof is as 
follows. Consider a measure ^{dp) = p~'^A{dp) on (0,1), and assume 
that is finite. Consider a Poisson point process of points 



Yt = {z,X})-{z,X?)e[-l,l] 




giYt)- [ l\Ys\{l-\Ys\)g"iY,)ds 



Jo 



i 
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with intensity dt ® X{dx) (8> i^idp), where A is the Lebesgue measure 
on the torus T. We use these points to create a stochastic flow 
of bridges just as above, except that now we consider functions not 
from [0, 1] to [0, 1] (bridges) but functions from T to T. A coalescence 
occurring at individual x G T, with mass p G (0, 1), corresponds to 
the composition by an elementary "bridge" just as in (95), which 
sends an arc of size p centered at x to x and sends the complement 
of this arc onto the full torus in a linear fashion. That is, 

P{y) =x a d{y,x) < p/2 

and, letting x be the point sitting opposite of x in T 

d(x,l3{y)) = d(x,y) otherwise. 

1 — p 

We let I3x,p = P he the above function, and we call 

where {xi,pi) are the list of atoms of M between times s and t listed 
in increasing order (which is possible since i' is assumed to be finite). 
Taking Vi, . . . , a collection of i.i.d. uniform variables on T, it is then 
trivial (see, e.g.. Theorem 3.13) to check that 

(nj,t > 0) is a A-coalescent (99) 

where IIj is defined by saying i ~ j if ^o,t{Vi) = ^o,tiVj)- Specializ- 
ing to the case where for e > 0, 

A%dx) = 5e{dx) 

so that Ae 6o and the associated coalescent 11^ converges in the 
Skorokhod topology towards Kingman's coalescent, it now suffices 
to study the limiting distribution as e — > of the p-point motion 
$f (Vi), . . . , ^ti^p)- Note for instance that in the case p = 1, <^^(t) 
is a continuous-time random walk on T with mean zero and second 
moment which can be computed as 

This is enough to characterize the limiting distribution of 1-point 
motions as Brownian motions with diffusion coefficients yl/12. The 
rest of the theorem follows by a similar Taylor expansion when p > 
2. □ 
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4 Analysis of A-coalescents 

In this chapter we give some general asymptotic sampUng formulae 
for A-coalescents that are the analogues of Ewens' sampling formula 
in these models. The mathematics underneath these results relies 
heavily on the notion of continuous-state branching processes, which 
may be thought of as the scaling limits of critical Galton- Watson 
processes. After first stating these formulae, we give a basic exposi- 
tion of the theory, and develop the connection to A-coalescents. This 
connection is then used to study in details the small-time behaviour 
of A-coalescents, i.e., close to the big-bang event of time t = when 
the process comes down from infinity. This raises many interesting 
mathematical questions, ranging from the typical number of blocks 
close to t = 0, to fractal phenomena associated with variations in 
mass of these blocks. Surprisingly, while many questions concerning 
the limiting almost sure behaviour of A-coalescents have an answer, 
our understanding of limiting distributions is much more limited. 
We survey some recent related results at the end of this chapter. 

4.1 Sampling formulae for A-coalescents 

We start by stating some general asymptotic sampling formulae for 
A-coalescents, which are the analogues of the Ewens sampling for- 
mula for A-coalescents (see Theorem 2.9). We focus in this exposi- 
tory section on the case of the infinite alleles model. To refresh the 
reader's memory, the problem we are interested in is the following. 
Consider a sample of n individuals, and assume that the genealogi- 
cal relationships between these individuals is given by a A-coalescent, 
where A is an arbitrary finite measure on (0, 1). Conditionally given 
the coalescence tree, assume that mutations fall on the tree as a 
Poisson process with constant intensity p on every branch. (In the 
case of Kingman's coalescent, it is customary to parameterize this 
intensity hj 9 = 2p). Recall that in the infinite alleles model, every 
mutation generates a completely new allelic type. We are interested 
in describing the corresponding allelic partition of our sample, e.g., 
how many allelic types are we likely to observe in a sample of size 
n, how many allelic types have a given multiplicity k > 1 (i.e., are 
present in exactly k individuals of this sample), etc. Recall also 
that in the case of Kingman's coalescent, a closed formula for the 
probability distribution of this partition is given by Ewens' sampling 
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formula. In the general case of A-coalescents, the problem is notice- 
ably harder, and in general no exact closed formula is known - nor 
is it expected that such a formula exists. One possible approach, 
investigated by Mohle [123], is to derive a recursive formula for this 
distribution, which makes it possible to compute numerically some 
quantities associated with it. However it is hard to extract useful 
information from it. Instead, we follow here a different approach, 
which is to obtain asymptotic results when the sample size n tends 
to infinity. 

Our first result comes from [18] and gives the asymptotic num- 
ber of allelic types An for a general measure A, subject only to the 
assumption that the coalescent comes down from infinity. 

Theorem 4.1. For A > 0, let 



As n — > oo, we have the following asymptotics in probability: 



by which we mean that the ration of the two sides converges to 1 in 
probability. 

As the reader might have recognized, the function defined in 
(100) is the Laplace exponent of a Levy process whose Levy measure 
is x^^A((ix). There is in fact a connection between this Levy process 
and the A-coalescent, which goes through the notion of continuous- 
state branching process. This probabilistic connection is interesting 
in itself and is developed in the next sections. A necessary and 
sufficient condition for the A-coalescent to come down from infinity 
in terms of the function -i/' is obtained later in Theorem 4.9 as a 
consequence of this connection. 

There are certain cases where one can obtain much more precise 
information about the allelic partition, such as the entire asymptotic 
allele frequency spectrum. This is the case where A has a property 
which we call (with a slight abuse of terminology) "regular variation" 
near zero: 

Definition 4.1. Let A be a finite measure on (0, 1) and let a € (1,2). 

We say that the A-coalescent has regular variation with index a if 




(100) 




(101) 
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there exists a function f{x) such that A{dx) = f{x)dx and a number 
A > such that 

f{x) ~ Ax^-"" (102) 

as X — > 0. 

In that case, we obtain the fohowing result. Let p > and assume 
that A satisfies (102) for some 1 < a < 2. Let H be the random 
infinite ahehc partition obtained by throwing a Poisson process of 
mutations on the infinite coalescent tree with constant mutation rate 
p. For n > 1 and 1 < A; < n, let An{k) be the number of blocks of size 
k of n|[„]. Thus An{k) is the number of allelic types of multiplicity 
k in the first n individuals of an infinite sample, under the infinite 
alleles model. Let also A^ = X]fc=i^n(fc) be the total number of 
allelic types, as above. Note in particular that the random variables 
{An)n>i (resp. {An{k))n>i for any fc > 1) are now all simultaneously 
constructed on a common probability space, so that it now makes 
sense to talk about almost sure convergence in (101). This coupling 
is natural in that it corresponds to revealing more and more data 
from a large sample, and is thus suitable for applications. 

Theorem 4.2. ([20], [18]) Under assumption (102) we have, almost 
surely as n —> oo; 

-^^PC (103) 

where C = a{a — l)/[^r(2 — a)r(a)]. Moreover, for every fixed 
k > 1: 

Anjk) , (g - 1) . . . (g + fc - 3) 

^ pC{2-a) (104) 

almost surely as n ^ oo. Moreover, if Pi, P2, . . . denote the ordered 
allele frequencies in the population, then 

P, ~ C7'j°-2, (105) 

almost surely as j — > 00, and C = (C/r(a — 1))^/^^""^. 



Comments. (1) The convergence in probability was first obtained 
in [20] in the case of Beta-coalescents with parameter (2 — a, a), using 
an exact embedding within the so-called stable Continuum Random 
Tree and an analysis of the mutation process using queues. However, 
these methods were limited to the case of Beta-coalescents and did 
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not yield the almost sure limit. This extension was done in [17] and 
[18] using a different, martingale-based approach to the problem. 
The same result also holds for the frequency spectrum of the infinite 
sites model. 

(2) Taking fc = 1, we see that the fraction of singletons in the allelic 
partition is An{l)/An ~ 2 — a almost surely as n ^ oo. Thus a can 
be measured in a straightforward fashion from a sample, because it 
is approximately 2 minus the proportion of alleles with multiplicity 
1. Note that in Kingman's coalescent, this fraction is asymptotically 
1/logn, and hence tends to as n — > oo. Thus if the fraction of 
singletons is not negligible in a particular data set, this is a good in- 
dication that Kingman's coalescent is not suitable for this data and 
that coalescent with multiple collisions are better approximations. 
Various data sets from pacific oysters suggest a value for a approx- 
imately around 1.4. See however the discussion in Example 4.2 of 
[66] and the work of Birkner and Blath [34] for further investigation 
(but in the case of the infinite sites model). 

(3) In the case where the A-coalescent does not come down from 
infinity, Mohle [124] has obtained a limiting result for the number 
of allelic types if in addition one assumes that x~^A(dx) < oo, 
i.e., by Theorem 3.5, if there is almost surely dust at any time in the 
coalescent. In that case, he was able to show that 




in distribution as re — > oo, where A has a distribution which can be 
described as follows: let Xt = — log St, where St is the mass of dust 
at time t (and note that Xt is then a subordinator). Then A has the 
same distribution as e~^^~'^^dt. (A similar result has been shown 
by Preund and Mohle [85] for coalescents with simultaneous multiple 
collisions that have dust). The condition that the coalescent has dust 
excludes cases such as the Bolthausen-Sznitman coalescent, but this 
particular example has been analysed in detail by Basdevant and 
Goldschmidt [12] and with slightly less precise results by Drmota et 
al. [69]. 
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4.2 Continuous-state branching processes 
4.2.1 Definition of CSBPs 

In this section, we backtrack a little to give an introduction to 
Continuous-State Branching Processes (or CSBP for short), which 
we will then use to give a flavour to some of the proofs in Theorem 
4.1 and Theorem 4.2. 

In a nutshell, CSBPs can be seen as generalisations and/or scaling 
limits of Galton- Watson processes. Our presentation departs from 
the classical one, in that we have chosen not the most elegant ap- 
proach but the most effective one. In particular saves us the need to 
later introduce the technology of Continuous Random Trees, which 
would force us to get significantly more technical tan these notes are 
meant to be. However, this theory is extremely elegant and many 
ideas described below are more natural when seen through this par- 
ticular angle, so we have included in the appendix some notions 
about these objects. 

As mentioned above, a continuous-state branching process is a con- 
tinuous analogue of Galton- Watson processes. That is, the popula- 
tion size is now a continuous variable which takes its values in the 
set M+ (as opposed to the set Z+ for Galton- Watson processes). To 
define it properly, we first make the following observation in the 
discrete case. Let {Zn,n > 0) be a Galton- Watson process with 
offspring distribution L. Then, given Z^, one may write Zn+i as 
Zn+i = Yli=i where Li are i.i.d. random variables distributed as 
L, so 

Zn+l- Zn = Y,^^ (106) 

i=l 

where Xi = Li — 1. If we view Xi as the step of the random walk 
Sn = ^^=iXi, then (106) tells us that we can view Z as a time- 
change of the random walk {Sn,n > 0), where to obtain Zn+i from 
Zn we run the random walk S for exactly Zn steps. That is, one 
may write 

Zn = ST^,n>0 (107) 

where r„ = Zi + Z2 + ■ ■ ■ + Zn~i- Similarly, if {Zt,t > 0) is a 
Galton- Watson process in continuous time (i.e., individuals branch 
at rate a > and give birth to i.i.d. offsprings with distribution L), 
then one can associate a continuous time random walk which jumps 
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at rate a > to a position chosen according to the distribution of 
X = L — 1. Thus, given Zt = z, the rate at which Zt jumps to z + x is 
simply z times the rate at which the random walk (St, t > 0) jumps 
to Z + X. 

It is this last representation which we want to copy in the con- 
tinuous setting. The random walk {St,t > 0) will be replaced by a 
process with independent and stationary increments {Yt,t > 0), i.e., 
a Levy process, which will have the property that all its jumps are 
nonnegative, since the jumps of the random walk are always greater 
or equal to -1. This —1 will vanish in the scaling limit and so we will 
only observe jumps. On the other hand, the positive jumps of Y can 
be arbitrarily large. Thus let us fix {Yt,t > 0) a Levy process with 
no negative jumps (one says also spectrally positive), and let i'{dx) 
be the Levy measure of Y. That is, 

]E^g-A(y,-yo)) = exp(-tV(A)), 

where 

^(A) = a h6A+ / (e'^'-^ -1 + Xxli^<n)u{dx). 

2 Jo 

(Recall that z/ is a Levy measure means that f^{h'^ A l)v'{dh) < oo.) 
The is the Levy-Khintchin formula already discussed in the proof 
of Theorem 3.1. To simplify our presentation, we assume in what 
follows that a = b = 0. Then the corresponding Levy process may 
be characterized by its generator G (see Sato [140], pp. 205-212): if 
/ is rapidly decreasing function (an element of the Schwartz space, 
but there is no harm in thinking of a C°° function with compact 
support, which forms a core for the generator), then 

/•oo 

Gf{x) = / [fix + h)- fix) - hl^h^^}nx)]uidh). 
Jo 

Essentially this formula means that jumps of size h occur at rate 
vidh) (and through an ingenious system of compensation it is enough 
to require that the sum of the square of those jumps smaller than 1 
up to a given time has a finite expectation). We define the associated 
branching process as follows: 

Definition 4.2. For f € C°° with compact support, let Lfiz) = 
zGfiz). We call continuous-state branching process associated with 
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^p, any process {Zt,t > 0) with values in M+ such that 

f{Zt)- f Lf{Zs)ds 
Jo 

is a martingale with respect to the natural filtration of Z, for any 
f G C°° with compact support. This property determines uniquely the 
law of {Zt,t > 0), which is called the ip- continuous- state branching 
process, or ip-CSBP for short. The function tp is called the branching 
mechanism of Z. 

It can be shown that Z is a Markov process. This definition means 
that the transitions of a -i/^-CSBP are the same as those of a Levy pro- 
cess with Levy exponent il^{X), but they occur z times as fast, where 
z is the current size of the process. Note that CSBPs get absorbed 
at Zt = 0, because then the rate of jumps is 0. The fohowing result, 
which goes back to Lamperti, explains this further, by establishing 
the analogue to (107) in continuous space. 

Theorem 4.3. Let Z be a ip- continuous- state branching process, 
and let {Yt,t > 0) be the associated Levy process. Then if U{t) = 
J^^'^ Y~^ds, where T is the hitting time of by Y, and ifU~^{t) is 
the cadlag inverse ofU, i.e., 

U-\t) = inf{s > : U{s) > t}. 

then 

{Zt)t>o = {Yu-m))t>o- (108) 

It is easy to check this result: indeed, everything is made so that 
the process defined by the right-hand side runs the clock at speed z 
when Zt = z, and has apart from this time-change the same transi- 
tions as the Levy process {Yt,t > 0). If we want to emphasize the 
starting point of the CSBP, we write Zt(x) to mean that the process 
was initially started at Zq = x > 0. As a simple consequence of this 
definition, we get the following property: 

Theorem 4.4. Let {Zt{x),t > 0) be a ip-CSBP. Then Z enjoys 
the branching property.' that is, if x,y > 0, and if Z'{y) denotes 
an independent ip- continuous- state branching process started from y 
independent of Z, then we have the representation: 

Z{x + y) = Z{x) + Z'{y), (109) 
where the equality is an identity in distribution for the processes. 
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Proof. It is obvious that the right-hand side is also a Markov process 
with the correct transition rates, so the right-hand side is indeed a 
'(/'-CSBP, and its starting point is obviously x + y. Thus the law of 
the right-hand side is indeed identical to the law of Z{x + y). □ 

The meaning of the branching property (109), is as follows: if the 
initial population is x + y, we can think of these two subpopulations 
evolving independently of one another, and their sum gives us the 
total population Z{x + y). This is why it is often convenient to 
record the initial population x as Z{x). Theorem 4.4 is traditionally 
used as the definition of CSBPs: this is indeed the definition used 
by Jifina in 1958 [101], where these processes were first discussed: 
a continuous-state branching process is any Markov process on 
which enjoys the branching property. That the two definitions are 
equivalent is a sequence of theorems due to Lamperti [110, 111]. We 
have preferred to use Definition 4.2 because the role of the measure v 
is more immediately transparent, and the properties of CSBPs can 
be established much more directly, as we will see in what follows. 
When using Theorem 4.4 as a definition, Lamperti's transformation 
theorem (Theorem 4.3) is far from obvious, and in fact the proof in 
[111] misses a few cases. A recent paper by Caballero, Lambert and 
Uribe Bravo [49] contains several proofs and a thorough discussion. 

Theorem 4.5. (Lamperti [110]) Any continuous- state branching 
process {Zt,t > 0) is the scaling limit of Galton-Watson processes. 
That is, there exists a sequence of offspring distributions L^^^ (N > 
1), and a sequence of numbers cn, such that, if Z^^^ denotes the 
Galton- Watson process with offspring distribution L^^^ started from 
N individuals, then 

weakly in the Skorokhod topology. 

Proof, (sketch). This is a rather simple consequence of the classi- 
cal fact that any Levy process can be approximated by a suitable 
random walk: that is, there exists a sequence of step distribution 
X^^) and constants cn such that {jfS[^J^,t > 0) converges weakly 
in the Skorokhod topology towards the Levy process {Yt,t > 0), 
where {sj:^^ , i > 0) is the random walk with step distribution X^^^ . 
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Furthermore, X^^^ can be chosen to be integer-valued and "skip- 
free" in the sense that X^^^ > — 1 almost surely. Then the offspring 
distribution L^^^ is simply constructed as L^^^ = X^^^ + 1. The 
representation (107) then tells us that the relation (108) must hold 
in the limit, and hence the result follows. □ 

An important example of continuous-state branching process is 
given by the class of a-stable processes: 

Definition 4.3. The stable CSBP with index a S (1, 2) is the continuous- 
state branching process associated with the stable Levy measure 

u(dx) = —^^—x-'^-^dx. (110) 
r(i — a) 

In this case, the branching mechanism is ip{u) = Cu" for some C > 
0. 

In fact, if tpiu) = u} (quadratic branching) it is still possible to 
define a corresponding CSBP. Naturally, in that case the process 
is related to Brownian motion and is nothing else but the Feller 
diffusion: see Theorem A. 6 in the appendix. In this case we still 
speak of the 2-stable branching process. 

We now come to an interesting property, which shows a relation 
between branching processes and a certain differential equation. It 
turns out that this differential equation lies at the heart of the anal- 
ysis of A-coalescents. 

Theorem 4.6. Let Z be a il)- continuous- state branching process. 
Then for all A > 0, 

E{exp{-XZtix))) = exp{-xut{X)), (111) 

where the function t ut{X) is the solution of the differential equa- 
tion 

^uo(A) = A. 

Remark. This connection is the prototype of some deeper links 
which arise when the branching process is endowed with some ad- 
ditional geometric structure, in which case the differential equation 
becomes a partial differential equation: see, e.g., Le Gall [112]. 
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Proof. Define F{t, x, A) by saying E^(e~'^^*) = exp(-F(t, x,X)). By 
the branching property, it is easy to see that F{t, x, A) must be multi- 
phcative in x: that is, there exists /t(A) such that F{t, x, A) = xft{\). 
Then the Markov property shows that ftifsW) = /t+s(A). So The- 
orem 4.6 can be rephrased as saying that any solution to this func- 
tional equation must in fact satisfy (112) for some Laplace exponent 
ipiX) of some spectrally positive Levy process (Yt,t > 0). 

To see why this is true, we go back to the discrete case, where the 
argument is somewhat more transparent. Thus consider a Galton- 
Watson process {Zt,t > 0) in continuous time: each individual 
branches at rate a > and leaves i.i.d. offsprings distributed ac- 
cording to {pk)k>o- Let Q be the generator of the process: thus 
Q = ilij)i,j>o where qu = —a + api (since nothing happens when 
an individual branches and leaves 1 offspring), and qij = apj^i+i if 
j > i — The Kolmogorov backward equation P'{t) = QP{t) shows 
that 

= J^Qifc^fcjW = -aPijit) + ^apkPkj{t). 

k>0 k>0 

Therefore, if we look at the moment generating function of Z^, i.e., 

F{s,t) = E(s^'|Zo = 1) = Ei(s^'), we observe that 
Q 771 

j>0 j>0 k>0 

= —aF{s,t) + a ^^pfeE/;(s^*) by Fubini's theorem 

fc>0 

= —aF{s,t) + a ^^pfcF(s, t)'^ by the branching property. 

fc>0 



Thus if 4>{X) = au — g{u), where g{u) is the moment generating 
function of (pk), we have 

^ = -0(F(s,t)) (113) 

which is the precursor of (112). Using the discrete approximation of 
Theorem 4.5 and taking the limit in (113), we obtain (112). □ 

This explains further the role of the branching mechanism ■0: it 
may be interpreted as the Laplace exponent for the infinitesimal 
offspring distribution. 
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When Z is the a-stable CSBP and a = 2 (quadratic branching), 
the differential equation (112) is -u = —cu^, which should look fa- 
miliar to you: it is precisely the same differential equation that was 
obtained for the heuristic analysis of Kingman's coalescent in (23). 
This is naturally not a coincidence: in fact, we will develop in next 
chapter a connection between A-coalescents and -(/'-CSBP (for a cer- 
tain branching function if) to be determined) which will finally make 
this connection rigorous, and from which many other properties will 
follow. 

We end this section on the basic properties of branching processes 
with a statement about a necessary and sufficient condition for the 
process to become extinct, and, when the process does become ex- 
tinct, what is the chance it has already gotten extinct by some time 
t. Here, becoming extinct means that there is a finite T > such 
that Zt = (since is absorbing, then Zt = for all t > T auto- 
matically) . 



Theorem 4.7. (Grey's criterion [94]) Let Z be a ip-CSBP. Then Z 
becomes extinct in finite time almost surely if and only if 



Let Pz{t) denote the probability that Zt > 0, given Zq = z. Then 
Pz(t) = 1 — exp(—zv{t)), where v{t) is defined by 



Proof, (sketch) Note that 

p(Zt = 0) = lim E(e~^^*) = lim e-^"*^^), 

A— >oo A— >oo 

where lij(A) is the solution to the differential equation (112). Observe 
that this differential equation can be solved explicitly: 

Us _ _^ 

1p{Us) 

SO integrating between times and s, and making the change of 
variables x = ut we find (since uo{X) = A): 




(114) 




(115) 




(116) 
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Thus if (114) does not hold, it must be that limx^cx) ut{^) = oo, 
since the right-hand side of (116) does not depend on A. Hence 
r{Zt = 0) = 0, for all t > 0. On the other hand, if (114) holds, 
then lim;v^oo ut{X) = v{t) as defined by (115). Thus there is positive 
probability of extinction by time t > 0, which is equal to exp{—zv{t)). 
One must work slightly harder to show that eventual extinction has 
probability 1. □ 

Note that the situation is slightly more complicated in the con- 
tinuous world than in the discrete. For instance, (114) may fail but 
the process still has — > as i — > oo (for instance, Zt{x) = xe~^ is 
such a CSBP!) On the other side, there can also be explosions: this 
is investigated by Sliverstein [146]. 

4.2.2 The Donnelly-Kurtz lookdown process 

Let (Zt, i > 0) be a ^-continuous-state branching process. We would 
like to have a notion of genealogy for Z, i.e., a way of making sense 
of the intuitive idea that in this evolving population, some individu- 
als descend from a certain group of individuals at some earlier time. 
This turns out to be rather delicate in this continuous setting, and 
requires some additional technology. There are essentially two ways 
to proceed: one is to introduce continuum random trees, i.e., trees 
with the property that they branch continuously in time, such that 
the total population process has the law of {Zt,t > 0). A brief 
overview of this approach is presented in the appendix. The other 
possibility, which we have chosen to discuss here, is Donnelly and 
Kurtz's so-called (generalised) lookdown process. That these two 
notions of genealogy coincide is a theorem proved in [19] (see The- 
orem 12 in that paper). However, for the developments we have in 
mind (i.e., the analysis of A-coalescents through a connection with 
CSBPs), the approach which we propose here does not rely anymore 
on Continuum Random Trees, and so this aspect of things may be 
ignored by the reader. We stress however that initially (in [19, 20], 
the connection relied on the continuous random tree approach rather 
than the Donnelly-Kurtz lookdown approach developed here, which 
owes much to the work of [18] . Hopefully the latter approach makes 
these ideas more transparent. 

We now describe the lookdown process. This was originally in- 
troduced in [63] , in order to provide a system of countable particles 
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whose empirical measure would be a version of some predetermined 
measure-valued process (and CSBPs can precisely be viewed as one 
such example). This representation is very much in the spirit of 
the classical Fleming- Viot representation of Theorems 3.11 and 3.12. 
One way this process could be described would be to ask the follow- 
ing question: suppose a continuous population evolves according to 
the dynamics of a CSBP. Then we ask: what would samples from 
that population look like as time evolves? 

To ask the question in a more specific way, we may as usual endow 
each individual initially present with a unique allelic type (which, for 
us, will be just a uniform random variable on (0,1)). We sample from 
the population at time and run the population dynamics for some 
time. How has that sample evolved? Our only requirement is that 
each allele at time t > is represented with the correct frequency 
in our sample, but otherwise we may proceed as we wish to run the 
dynamics. The answer is as follows: assume for simplification that Z 
has only jumps and no Brownian component. Suppose that at some 
time t > 0, the population Z has a jump of size AZt > 0. This jump 
is produced by "one individual" who has a macroscopic offspring of 
size AZt- In the population right after the jump, a fraction 

AZt 



has just descended from that individual and thus carries the type 
of this individual. The other individuals in the population haven't 
died but their relative frequency is now only (1 — p): thus if an allele 
occupied a fraction a of the population, it now occupies a fraction 
a(l — p). One can check that the following procedure produces ex- 
actly the desired change: 

1. A proportion p of individuals is selected by independent coin- 
toss. (They are said to participate in the birth event). The 
type of those individuals is changed (if needs be) to the type 
of the lowest individual participating in the birth event. 

2. The other types are shifted upwards to the first possible non- 
participating individual. (See figure 9) 

This procedure described what happens at any single jump time 
of the population Z. If this procedure is applied successively at all 
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Figure 9: Representation of the lookdown process. Levels 2,4 and 5 
participate in a birth event. Other types get shifted upwards. The 
numbers on the left and on the right indicate the types before and 
after the birth event. 

jump times of the population, then we obtain a countable system 
(^i(i)ii > 0) which indicates the type of the individual at level i. 
Naturally, at any time t > 0, {^j} forms a subcollection of the initial 
types {Ui}. While it seems a priori that no type can ever be lost 
in this construction (things just get shifted upwards, and never die), 
this is not accurate: in fact, jumps may accumulate and push a 
given type "to infinity", at which point it has disappeared from the 
population and does not exist anymore. In fact, we will see that 
under Grey's condition (114), the number of types that survive up 
to any time t > is only finite almost surely. Since the initial 
collection of types is initially exchangeable and that the rest of the 
evolution is determined by i.i.d. coin-tosses, we immediately get that 
(Ci(^))i^i is an infinite exchangeable sequence for each t > 0. 

Definition 4.4. The almost sure limit 



is called the Donnelly- Kurtz (generalised) lookdown process. 

We have done everything so that would accurately represent the 
composition of types in the population Z, so it remains to now state 




i=l 
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the theorem. First, we associate to the continuous-state branching 
process {Zt,t > 0) a measure-valued process {Mt,t > 0) which is a 
measure on (0,oo). Informally speaking, for r £ [0,1], Mf([0, r]) is 
the size of the population descended from the first r individuals of 
the population, i.e., from individuals initially located in the interval 
[0,r]. One way to define it is to define a process {^t(a;)}(>o,a;>o 
simultaneously for all t and for all x > 0. The way to do so is to use 
the branching property: for instance, to construct simultaneously 
Zt{x) and Zt{x + y), we start with a version of Z{x) and add an 
independent version of Z{y). The sum of these two processes is used 
to construct Z{x-\-y). This construction defines uniquely a measure 
Mt on M_|_ such that 

Mt{[x,y]) = Zt{y)- Zt{x), < x < y < 1. 

From this measure Mt, there is another way to represent the com- 
position of the population given this process M^, which is simply to 
look at the ratio process: 

^;,[0,.]=«M, r<l, 

where Zt = Mt[0, 1] is the total mass. Thus Rt is a measure with 
total mass equal to 1. 

Theorem 4.8. The ratio process {Rt,t > 0) and the lookdown pro- 
cess > 0), have the same distribution. 

A consequence of this representation is the following result about 
the Donnelly-Kurtz lookdown process, which tells us that the number 
of types which survive to time t > is finite if and only if the 
associated branching process {Zt,t > 0) dies out almost surely. 

Corollary 4.1. The number of types surviving at time t > in 
the Donnelly-Kurtz lookdown process > 0) (i.e., the number 

of atoms of 'Et ), is finite for all t > almost surely, if and only if 
Grey's condition (see Theorem J^.l) is fulfilled. Moreover, if Grey's 
condition holds, then the number of types which survive at time t 
is Poisson with mean v{t) < oo, where v is the function defined in 
(115). 

Proof. Fix some n > 1, and separate the interval [0, 1] into n sub in- 
tervals [i/n, (i + l)/n]. In the Donnelly-Kurtz lookdown process, we 



Coalescent theory 



117 



identify all individuals whose types fall into the same subinterval. 
Thus by the branching property, we have n copies of a V'-CSBP, all 
started with mass 1/n, which we view as n different families. How 
many of those families have survived by time t? By Theorem 4.7, this 
is a Binomial random variable with parameters n and Pz{t) where 
z = 1/n. Recall that Pz{t) = 1 — exp{—zv{t)) ~ v{t)/n as n — > oo. 
Thus by the Poisson approximation to binomial random variables, 
we find that as n — > oo, if Kn{t) is the number of families surviving 
by time t, 

Kn{t) Poisson(z;(i)) 

as n — > oo. On the other hand, lim„_>oo -f^n(i) exists almost surely 
and is equal to the number of types in the Donnelly-Kurtz lookdown 
process. Thus this has the distribution of the random variable in the 
right-hand side of the above equation, and proves the corollary. □ 

Corollary 4.1 will be a crucial step towards proving fine results 
on A-coalescents in next section. The Poisson distribution for the 
number of types is perhaps easier to understand in terms of contin- 
uous random trees, and should be compared with Theorem A. 10 . 
For instance, this is the same reason why the number of excursion of 
Brownian motion up to time ri (the inverse local time) is a Poisson 
random variable. 

4.3 Coalescents and branching processes 

Having given a brief overview of continuous-state branching pro- 
cesses and the lookdown process, we are now ready to describe the 
connection which relates A-coalescents to certain CSBPs. Intuitively, 
this connection states that, at small times, the genealogy of any tp- 
CSBP is given by a certain A-coalescent. It is essential to note that 
this description is valid only asymptotically as t ^ (however, an 
exact embedding exists for the a-stable CclSG, clS will be discussed 
below). This connection allows us to state a general result about A- 
coalescents, which gives a second necessary and sufficient condition 
for coming down from infinity, and if they do, tells us the speed at 
which they come down from infinity. That is, we find a deterministic 
function v{t) such that Nt/v{t) — > 1 almost surely as t — > 0, where 
Nt is the number of blocks at time t. 



Coalescent theory 



118 



4.3.1 Small-time behaviour 

We point out that, if A does not satisfy Definition 4.1, some con- 
siderable difficulties arise, and will generally lead to Nt oscillating 
between different powers of t as t — > 0. (An instructive example of 
such a measure is analysed in a fair amount of details in [18]). The 
general solution that we now present is due to [17, 18]. 
Let A be any finite measure on [0, 1). Then define 



Let {Zt,t > 0) be the CSBP with branching mechanism -0. 

Theorem 4.9. The K- coalescent comes down from infinity if and 
only if Z becomes extinct infinite time, i.e., 



If it does, then define v{t) by: C^^n ^tA- = t. Then as t ^ 0, 



almost surely and in for every p > 1. 

Proof. It is easy to understand that coming down from infinity might 
be related to the extinction in finite time of the CSBP. Indeed, by 
Theorem 4.7, extinction of a CSBP is equivalent to the fact that only 
finitely many individuals at time have descendants alive at time 
t > 0, for any t > 0. When this occurs, the entire population at time 
t > comes from a finite number of ancestors and thus, running time 
backwards (assuming that the genealogy is approximately described 
by a A-coalescent), the coalescent has come down from infinity. The 
convergence (119) also follows intuitively from a similar argument 
and Corollary 4.1 once this connection is accepted. 

We now explain why the genealogy of {Zt,t > 0) should be given 
by a A-coalescent asymptotically as t — > 0. Consider the CSBP 
{Zt, t > 0) defined by (117) and its genealogy defined in terms of the 
lookdown process. Then we know that at a time t > such that 
AZt > 0, a proportion p = AZ/Z of lineages is coalescing. Assume 




(117) 




(118) 
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to simplify that Zq = 1 (this is of course unimportant). Recah 
that, by Lamperti's transform, Z is a time-change of a Levy process 
(Ytjt > 0) with Laplace exponent ^{q). Now, both the time-change 
functional 

ds ^ 

and Y are almost surely continuous at t = 0. Therefore Z is also 
continuous with probability 1 near t = 0, and it follows that if t is 
small, the proportion of lineages that coalesces at time t is 

AZt 

p=—l^ AZt. (120) 

But using Lamperti's transform one more time, we see that AZi = 
AYjy-i. Thus, we conclude that at time Ut, a fraction approximately 

AYt of lineages coalesces. Remembering that Ut = /^(l/Ys)^^, we 
see that for t small, Ut ^ t as well. 

To summarize, we have seen that with a negligible time-change 
the fraction of lineages that coalesces is equal to the jumps of the 
spectrally positive Levy process {Yt^t > 0). But by the Levy-Ito de- 
composition, these jumps occur precisely as a Poisson point process 
with intensity 

dt (g) x~'^k{dx) 

since, from (117), we see that the Levy measure of Y is x~'^K{dx). 
By the Poisson construction of A-coalescent (Theorem 3.2), this gives 
precisely a A-coalescent. □ 

While this is a convincing argument for why the genealogy of Z 
close to i = should be given by a A-coalescent approximately, it is 
much harder to turn it into a rigorous proof. The difficulty, of course, 
lies in the error made by the approximations. The main source of 
errors is due to the potentially wild fluctuations of the Levy process 
Y around its starting point. These fluctuations are understood with 
a fair amount of details (see, e.g., [135]). It is for instance known that 
the increments Yj — Iq may oscillate between two different powers of t 
which are known as the inverse lower- and upper-index respectively. 
This helps controlling the error but ultimately this approach needs 
some assumptions on the regularity of the Levy process: namely, that 
the lower-index (3 should be strictly greater than 1. The approach 
which we describe below bypasses these difficulties by giving a direct 
proof of Theorem 4.9. 
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4.3.2 The martingale approach 

We introduce now a martingale discovered in [17] which is a crucial 
step for the proof of Theorem 4.9. Observe that the function v{t) is 
the unique solution to the equation 

/■* ih(v(r)) 

logvit) -logv(z) + \ dr = 0, VO<^<t. (121) 

Jz v{r} 

Since we wish to prove that Nt is approximately equal to v{t), it 
makes sense to study the quantity: 

Xi := log Nt - log Nz + ^^^^dr, VO < ^ < i. (122) 

Lemma 4.1. There exists a bounded process {Ht,t > 0) such that 

Ml ■.= Xl + Hrdr 

is a martingale. Moreover, there exists C > such that the second 
moment process of M^ satisfies K({Ml)'^) < C{t—z) for allO < z < t. 

Since the second moment process of M^ is small, this implies by 
Doob's inequality that M^ must be small, and hence (since the in- 
finitesimal error H is negligible) that X* itself is small. Since v is 
the unique solution to (121), this and some further analysis yields 
Theorem 4.9. 

Why is this a martingale? Let < p < 1 and assume the 
number of blocks is currently n. Let Y^^p be a Binomial(n,p) random 
variable. We may think of Y as the number of blocks that participate 
in a p-merger. If y > 0, then the number of blocks after the p-merger 
is n — Yn,p + l. Thus we see that, since Y is typically small compared 
to n, and using the approximation log(l — x) ~ —x, 

E{dlogNs\J^s)= [ E 
Jo 



p "^Aidp) 

= r -nP+l-(l-Pr^-.A(,rt 

Jo ri 



log' 



n 



Yn,p + l{y„ p>o} 



n 



p '^A{dp) 




-E{Yn,p) + F{Yn,p > 0) 
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Recalling that n = Nt and noting that (1— p)" ~ e~"^ in the integral 
above, we recognize in this integral the right-hand side of the equality 
which defines the function ^ in (117). Hence we conclude that, up 
to some small errors, 

E(dlogiV,|.^,)«-^^ (123) 
which is coherent with Lemma 4.1. 

Interpretation. With hindsight, the martingale M* has a simple 
interpretation. Recall that if Nt is the number of blocks of a A- 
coalescent, then we always get a martingale (A^f, t > z) for all z > 
by defining 

= Nt+ I -f{N,)ds, t > z 

J z 

where 7(n) = 7n = X]fc=2(^ ~ ^)^^n,k- It is not hard to see ana- 
lytically that ^(A) ~ 7(A) as A — > 00. Thus the martingale of 
Lemma 4.1 can be viewed as the martingale that one obtains from 
applying Ito's formula for discontinuous processes to log(A't). From 
this point of view it is rather surprising that the method of [17] is 
conclusive: indeed, the logarithm function of t is typically insensitive 
to small fluctuations and only picks up variations in the "power" of 
t. However this turns out to be a strength as well, since the great- 
est challenge in this problem is to control wild fluctuations of the 
function ip{X), which may oscillate between two different powers of 
A. 

4.4 Applications: sampling formulae 

We now briefly explain how to obtain Theorem 4.1 and Theorem 4.2 
from the small-time behaviour of the number of blocks, i.e.. Theorem 
4.9. 

Sketch of proof of Theorem 4-1- Consider the infinite alleles model, 
and make the following observation. Every mutation that appears 
on the tree is quite likely to have a corresponding representative in 
the allelic partition. Indeed, once a mutation arrives on the tree, it 
becomes quite difficult to fully disconnect it from the leaves: this is 
because a randomly chosen mutation is quite likely to be at the top 
of the tree. By analysing this process more carefully, a result of [18] 
shows: 
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Lemma 4.2. Assume that the A-coalescent comes down from infin- 
ity. Let Mn be the total number of mutations on the coalescence tree 
restricted to [n], and let An denote the total number of families in the 
allelic partition restricted to [n]. Then j4„/M„ — > 1, m probability. 

Given this lemma, our first task is thus to estimate the total num- 
ber of mutations on the tree. (Note that this is identical to the total 
number of allelic types in the infinite sites model). Since mutations 
fall as a Poisson process with intensity p, we have that given the 
coalescence tree, 



where L„ is the total length of the tree, i.e., the sum of all the edge 
lengths in the coalescence tree of the first n individuals in the sample. 
Thus the problem becomes that of finding asymptotics of L„. But 
note that if initially the number of blocks is Nq = n, then the total 
length of the tree may be written as 



where C is the coalescence time of all n individuals. Indeed the 
contribution to Ln of the time interval {t,t + dt) of all branches 
in the tree is Ntdt, so integrating gives us the result (125). Using 
consistency of the A-coalescent, we can rewrite (125) in terms of a 
A-coalescent process started with infinitely many particles as L„ = 
Ntdt, where e is chosen such that ^ n. (It is not obvious at all 
that such a choice of e is possible with positive probability, but it can 
be proved that this indeed the case at least in the regular-variation 
case, as we will discuss later. Other tricks need to be used in the 
general case, which we do not discuss here). Since Nt ~ v{t), this 
means e ~ u{n) where u{\) = ?/^' Plugging into the formula 

Ln = Ntdt, and making the change of variable A = v{t), oi t = 




(124) 




(125) 



u{X) and dt 



1 



we find: 




(126) 



The lower-bound of integration makes no difference and we may 
replace it by 1 if we wish. Recalling (124), we now obtain directly 
the result of Theorem 4.1. □ 
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Sketch of proof of Theorem 4-2. A moment of thought and recalling 
(110), we find that if K{dx) satisfies the regular variation property of 
(102), then '4){X) ~ CA" as A — > oo, for some constant C > whose 
value below may change from line to line. From this it follows by 
Theorem 4.1 that, in probability: 



It turns out that the estimates in [17] and [18] are tight enough that 
this convergence holds almost surely. Since the allelic partition is 
obviously exchangeable, we may now apply results about the Taube- 
rian theory (Theorem 1.11). Theorem 4.2 follows immediately. 

4.5 A paradox. 

The following consequence of Theorem 4.9 says that, among all the 
A-coalescents such that A[0, 1] = 1, Kingman's coalescent is extremal 
for the speed of coming down from infinity. This is a priori surprising 
as in Kingman's coalescent only two blocks ever coalesce at a time, 
whereas in a coalescent with multiple mergers it is always a positive 
fraction of blocks that are merging. The assumption 



is a scale assumption, as multiplying A by any number is equivalent 
to speeding up time by this factor. 

Corollary 4.2. Assume (128). Then with probability 1, for any 
e > 0, and for all t sufficiently small, 



Proof. Without loss of generality assume that the A-coalescent comes 
down from infinity. To see how the corollary follows from Theorem 
4.9, observe that since e~'^^ <\ — qx + g^x^/2 for x > 0, 




(127) 



A[0, 1] = 1 



(128) 



Nt > j{l-e). 




x^v{dx) < y (due to (128)). 



(129) 



Hence if u{s) = -rr-r (so that v is the inverse of u): 



f°° 2 2 2 

u{s) > —^dq = - and v{t) > -. 
Js Q s t 



(130) 
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Due to Theorem 4.9, Nt ~ v{t) as t — > 0, implying that Nt > 
2(1 — e)/t with probabiUty 1 for aU t smah. □ 

4.6 Further study of coalescents with regular variation 

The next section is devoted to a finer study of the case where the 
measure A is the Beta(2 — a, a) distribution, with 1 < a < 2. In 
that case, an exact embedding of the coalescent in the corresponding 
continuous-state branching process (or the CRT) exists, and the spe- 
cial properties of this process (in particular, self-similarity) allows 
us to deduce several nontrivial properties of the Beta-coalescents. 
Some of these properties can be transferred by universality to the 
more general coalescents with regular variation. 



4.6.1 Beta-coalescents and stable CRT 

Let {Zt,t > 0) be an a-stable CSBP, (i.e., with V(A) = A°: see (110). 
Assume to simplify that Zq = 1. If we use the same reasoning as 
in the sketch of the proof of Theorem 4.1, we may ask: what is the 
rate at which we will observe a p-merger of the ancestral lineages, 
for any < p < 1? Let 

a{a - 1) _i„^ 

qix) = — -X 

^ r(2-a) 

be the density of the Levy measure of the stable subordinator with 
index a. Thus, if the current population size is A, the rate at which 
there is a jump of size x in the population is Ag(x)dx. Reversing the 
direction of time, this means that a fraction p of lineages coalesces, 
where 

X Ap 

p = — or X = 

A+x 1-p 

since the new population size after birth is A-\- x. Thus: 

rate of p-merger = Ag(x) — dp 

dp 

_ aja-l) ( Apy'- A 

= cA^-'^p-^Aidp) 

where A is the Beta distribution with parameters 2 — a and a and 
c = a{a — l)r(Q). Thus if time is sped up by a factor Z^~'^/c, the 
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rate is exactly the rate of p-mergers in a Beta-coalescents. We have 
thus proved the following result. 

Theorem 4.10. Let 1 < a < 2 and let < s < t. Define a random 
partition vr* by saying i ^ j if and only if individuals i and j have 
the same ancestor in the Donnelly-Kurtz lookdown process of the a- 
stable branching process. Let Rf = c Jq Zj^^ds and let R^^ be the 

cadlag inverse of R. Then for all < s < t, if Us = '^R-i^(tLs)> 

(n„o<s<t) 

is a Beta- coalescent run for time t. 

The version of this result quoted in the theorem was first proved 
by Birkner et al. in [36]. There is an equivalent version on Con- 
tinuum Random Trees, which was subsequently proved in [20] by 
showing that the two notions of genealogies defined by the lookdown 
process and by the continuum random tree must coincide. (It is the 
version on CRTs which is the most useful for capturing fine aspects 
of the small-time behaviour - although see [19] for what you can 
do with just the lookdown process). However, the elementary ap- 
proach which we give here, is based on yet unpublished work with J. 
Berestycki and V. Limic [18], and this bypasses the rather complex 
calculations of [36]. This result may be seen as a generalization to 
the stable case of an important result due to Perkins [129] in the 
Brownian case: see also [16] for related results in this case. 

Note that this result gives us a better understanding of Theorem 
3.8, where genealogies of population with offspring distribution in 
the domain of attraction of a stable random variable converge to a 
Beta-coalescent. 

4.6.2 Backward path to infinity 

It is also possible to get some information about the time-reversal 
of the process, a bit like in Aldous' construction and Corollary 2.1. 
However this is much more complicated in the case of Beta-coalescents: 
the first difficulty is that one doesn't know how many blocks were lost 
in the previous coalescence (unlike in Kingman's coalescent, where 
we know we have to make exactly one fragmentation). 

A first result in this direction says that roughly speaking, if A''^ = n 
and 81,82, ■■ ■ denote the previous number of blocks at times before 
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t, then a result of [19] states that, after shifting everything by n, 
{Si — n, S'2 — n, . . .) converges in distribution towards a random walk 
with a nondegenerate step distribution (Si, S2, ■ ■ ■)■ The limiting 
step distribution Si+i — Si turns out to have an expected value of 
l/(a — 1). This result, combined with a renewal argument, shows: 

Theorem 4.11. ([19]) Let Vn be the event that Nt = n for some t. 
Then 

lim P(K) = a - 1. (131) 



We also note that there exists a formula for the one-dimensional 
distributions of the Beta-coalescent, which can be found in Theorem 
1.2 of [19]. 

4.6.3 Fractal aspects 

Changing the topic, recall that for random exchangeable partitions, 
we know that the number of blocks is inversely related to the typical 
block size (see Theorem 1.4). Here, at least informally, since the 
number of blocks at small time is of order we see that 

the frequency of the block which contains 1 at time t should be of 
the order of (this result was proved rigorously in [19]). Put 

another way, this says that almost all the fragments emerge from the 
original dust by growing like t^^^'^~^\ We say that l/(a — 1) is the 
typical speed of emergence. 

However, some blocks clearly have a different behavior. Consider 
for instance the largest block and denote by W{t) its frequency at 
time t. 

Theorem 4.12. ([19]) 

(Qr(a)r(2 - a))^/"t-^/"T^(t) X ast [0 
where X has the Frechet distribution of index a. 

Hence the size of the largest fragment is of the order of t^/", which 
is much bigger than the typical block size. Note that the random- 
ness of the limiting variable X captures the intuitive idea of a rein- 
forcement phenomenon going on: the bigger a block is, the higher 
its chance of coalescing later on. Random limits in laws of large 
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numbers are indeed typical of processes with reinforcement such as 
Polya's urn. 

This suggests to study the existence of fragments that emerge with 
an atypical rate 7 7^ l/(a— 1) . To do so, it is convenient to consider a 
random metric space (5, d) which encodes completely the coalescent 
n (this space was introduced by Evans [77] in the case of Kingman's 
coalescent). The space {S,d) is the completion of the space (N, d), 
where d{i,j) is the time at which the integers i and j coalesce. In- 
formally speaking, completing the space {1,2,...} with respect to 
this distance in particular adds points that belong to blocks behav- 
ing atypically. In this framework we are able to associate with each 
point X € S and each t > a positive number r]{x, t) which is equal 
to the frequency of the block at time t corresponding to x. (This is 
formally achieved by endowing S with a mass measure rj). In this 
setting, we can reformulate the problem as follows: are there points 
X € S such that the mass of the block Bx{t) that contains x at time 
t behaves as t"' when t — > 0, or more formally such that ri{x, t) x t"'? 
(Here f{t) x g{t) means that log f{t)/ log g{t) 1). Also, how 
many such points typically exist? 

We define for 7 < l/(a — 1) 

W7) = {-e^:liminfi^^^i^ 
t^o log t 

and similarly when 7 > l/(a — 1) 

c f \ / ^ c r log{'n{x,t)) 

-51^11(7) ={x e S : hmsup — > 7}. 

t_>o log t 

When 7 < l/(a — 1), 5'thick(7) is the set of points which correspond 
to large fragments. On the other hand when 7 > l/(a — 1), 5'thin(7) 
is the set of points which correspond to small fragments. In the 
next result we answer the question raised above by computing the 
Hausdorff dimension (with respect to the metric of S) of the set 
5'thick(7) or S'thm(7): 
Theorem 4.13. ([20]) 
1. If^<-i<-^ then 

J a — ' a— 1 

dimT^ Sthickil) = 7a - 1. 

// 7 < 1/a then Sthickil) = o-s- hut S{l/a) / almost 
surely. 
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^- // ^ < 7 < then 

ex. 

dim.-H Sthinil) = —, TTo - 1- 

7(a — Xy 

Ifl> (a-i)ij then Sthinil) = a.s. but S{j^^^) / almost 
surely. 

Comment. The maximal value of dim-?^ 5* (7) is obtained when 7 = 
l/(a — 1) in which case the dimension of 5(7) is also equal to l/{a — 
1). This was to be expected since this is the typical exponent for the 
size of a block. The value of the dimension then corresponds to the 
full dimension of the space S, as was proved in [19, Theorem 1.7]. 



l/(a-l) 




1/a l/(a-l) cc/(a-l) ^ 



Figure 10: Multifractal spectrum map 7 1— > diuiT-i S {'y) . The left- 
derivative at the critical point is a while the right-derivative is — q. 



4.6.4 Fluctuation theory 

The analysis of fluctuations, even for Beta-coalescents, seems consid- 
erably more complicated than any of the law-of-large number type 
of result described above. Only very partial results exist to this 
date. For instance, there isn't any result available concerning the 
fluctuations of the number of blocks at small time, or the biologi- 
cally relevant total length of the tree. However for this last case, 
there is a partial result which is due to Delmas, Dhersein and Siri- 
Jegousse [59], which we describe below. We first need a result on 
the total number of collisions which was proved simultaneously and 
independently by Gnedin and Yakubovich [92] on the one hand, and 
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by Delmas, Dhersein and Siri-Jegousse [59] on the other hand. Let 
n > 1, and let denote the total number of coalescence events of a 
Beta-coalescent started from n blocks, before there is only one block 
left. That is, is the total number of jumps of the chain which 
counts the number of blocks (and decreases from n to 1). 

Theorem 4.14. As n ^ oo, 



where {Ys,s > 0) is a stable subordinator with index a started from 



This result also holds under a slightly more general form of regular 
variation than (102), and the exact assumptions needed in [92] and 
in [59] are slightly different (note that the a of [92] is what we here 
call 2 — a). Note in particular that the order of magnitude of r„ is 
(a — l)n, to the first order. 

Delmas, Dhersin and Siri-Jegousse then consider the length of a 
partial tree of coalescence, i.e., the sum of the length of the branches 
from time until a given number k of coalescence events. In view of 
the above discussion, it is sensible to choose k = [nt\ with t < a — 1. 
Let Ln{t) denote the corresponding length. The main result of [59] 
(Theorem 6.1) is as follows, and shows a surprising phase transition 
at a = (1 + -v/5)/2, the golden ratio. Let 



Theorem 4.15. Let ao = (1 + \/5)/2. Let {Ys,s > 0) be a stable 
subordinator with index a started from 0, and let (5 = — 1 — (l/a)-|-a. 

(1) Let a G (1, oiq). For all t < a — 1, we have the convergence in 
distribution: 




0. 





(132) 



(2) Let a G [qq, 2). Then for any e > 



n 



' {Ln{t)-n- 



2-a 



't{t)) 4 



in probability. 
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Intuitively, when we plug in t = a — 1 in the left-hand side, Ln{t) 
is then almost the same thing as L„ since there are approximately 
no more than (a — l)n coalescence events by Theorem 4.14. Note 
that by doing so, we recover the correct first order approximation 
of Theorem 4.2. This strongly suggests that the result is true also 
for L„ instead of Ln{t) in the left-hand side and t = a — 1 in the 
right-hand side, but this has not been proved at the moment. 

Theorem 4.15 allows the authors of [59] to deduce a corollary about 
the fluctuations of the number of mutations that fall on the partial 
tree. For a sufficiently small, the fluctuations of the length of the 
tree dominate the Gaussian fluctuations induced by the Poissoniza- 
tion, hence obtaining the random variable on the right-hand side of 
(132), while for larger a it is the opposite, and the limit is Gaussian. 
Surprisingly, the phase transition here occurs at a = \/2, rather than 
the golden ratio. See Corollary 6.2 in [59] for further details. 
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5 Spatial interactions 

We now approach an area which seems to be expanding at a rapid 
pace, which consists in studying coalescing processes of particles with 
spatial interactions. The prototype of such a process is the model 
of (instantaneously) coalescing random walks, so we first describe 
a bit of classical work on this, such as the duality with the voter 
model, the result of Bramson and Griffeath [42] on the long term 
density of the system as well as Arratia's Poisson point process limit 
[8]. (This is done by appealing to the intuitive approach recently 
developed by van den Berg and Kesten [25], which we describe). We 
then move off to the stepping stone model, which are spatial versions 
of the Moran model, and describe some results of Cox and Durrett 
[53] and of Zahle, Cox and Durrett [152]. Reversing the direction 
of time leads us naturally to the spatial A-coalescents, introduced 
by Limic and Sturm in [116]. We describe their result about the 
long-term behaviour of spatial coalescents on a torus. We then push 
the description of spatial coalescents further by briefly describing the 
result of [6] on the global divergence of these processes. Finally we 
draw a parallel with the work of Hammond and Rezkhanlou [96], 
which leads us to a general discussion on coalescent processes in 
continuous space. 

5.1 Coalescing random walks 

The model of coalescing random walks is awfully simple to describe: 
let d > 1 and imagine that, initially, every site of 'L'^ is occupied by a 
single particle. As time evolves, particles perform independent sim- 
ple random walks in continuous time with identical rates of jump (say 
1). The interaction arises when a particle jumps onto a site which 
is already occupied: in that case, the two coalesce instantly. That 
is, their trajectories are identical after that time. (One must first 
ensure that the model is well-defined but that is not a big problem: 
see, e.g., Liggett [115]). Obvious questions pertain to the density of 
this system and to the location of particles after rescaling so that the 
density is of order 1. Some more subtle ones ask what is the location 
of the set of ancestors of a given particle. This system may be easy 
to define but its analysis is far from simple, and has given birth to a 
rich theory. 
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5.1.1 The asymptotic density 

Let pt be the probability that there is a particle at the origin at time 
t. Of course, the system is translation-invariant so this is also the 
probability that there is a particle at any given site. Hence pt should 
be regarded as the density of the system at time t. Bramson an 
Griffeath showed the following remarkable result on the asymptotic 
behaviour of pt'- 

Theorem 5.1. As t ^ oo, 



Here 7^ is the probability that simple random walk on TL never re- 
turns to its starting point. 

Bramson and Griffeath's original proof was rather complicated 
and based on a moment calculation of Sawyer [141], which, being 
essentially a big computation, did not shed much light on the subject. 
Later, a simpler and more probabilistic proof was discovered by Cox 
and Perkins [56] using the super-Brownian invariance principle for 
the voter model of Cox, Durrett and Perkins [54]. As we will see, 
the voter model is indeed dual to coalescing random walks and hence 
it is not a surprise that this invariance principle would be of great 
help. 

However, a completely elementary approach has been recently de- 
veloped by van den Berg and Kesten [25], and this approach has 
the merit of being extremely robust to changes in the details of the 
model. The drawback is that one often has to work in higher di- 
mensions, i.e., above d = 3. However, since this approach is both 
elementary and elegant, we propose a brief exposition of the main 
idea. 

Proof, (sketch) 

The idea is to try to compute the derivative of pt- On first order 
approximation , 



since the density decays when two particles meet, which happens at 
rate roughly p^ if the location of the particles were independent. If 



Pt ~ 9d{t) ■■ 



< logt/(7rt) 
Mildt) 



ifd=l 
ifd = 2 
ifd>3 



(133) 



U 2 
dtP' " 



(134) 
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(134) was an equality, we could solve this differential equation and 
get that, as t oo, pt ~ 1/t. This gives us the right order of 
magnitude when d > 3, but even then note that the constant is off: 
we are missing a factor 7^. 

A more precise version of (134) is the following. Assume that 
d > 3. Note that, to compute the derivative of pt there is an exact 
expression based on the generator of the system. If 77 E {0, 1}^'' 
denotes the configuration of the system (where we identify rj with 
the set of occupied particles) , and if 77^^^ denotes the configuration 
where a particle at x has been moved to y, and rj^^'^ the configura- 
tion where a particle at x has been killed, then the generator G of 
the system of particles is 

(where y ~ x means that y and x are neighbours). Specializing to 
the function /(ry) = Ijog^}, and denoting / = : y E ??} this 

implies 

Gfiv) = ^mv}^ - l{oer,} 
/ 

= - 1{0G»?}) - l{oe7?} 

Therefore, by translation invariance, since IE(/) = 2cnP(0 E rj), we 
get: 

j^pt = ElGfivt)] 

= — P(both and ei are occupied) (135) 

where ei is any of the origin's 2d neighbours. If the occupation 
of both and ei at time t were independent events, we would thus 
immediately obtain equality in (134). However, this is far from being 
the case: indeed, if there is a particle at the origin, chances are that 
it killed (coalesced with) any particle around it! There is thus an 
effect of negative correlation here. Remarkably enough, it turns out 
that this effect can be evaluated. 
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Indeed, what is the chance that both and ei are occupied? Let 
E be this event, and fix some number At > such that At ^ oo 
but At = o{t) (think, for now, of At = \/t). For the event E to 
occur, there must be two ancestor particles at time t — At, located 
at some positions x and y S Z"^, and the trajectories of two indepen- 
dent simple random walks started from x and y must find their way 
during time At to and ei without ever intersecting. If At is large 
enough, then chances are that x and y must be far apart, in which 
case the events that x and y are occupied are indeed approximately 
independent. Hence the probability that x and y are both occupied 
is about Pt_^t- However, since At = o(t), pt-At ~ Pt and thus this 
probability is approximately p|. Now, to compute the probability 
that the two random walks end up at and ei respectively without 
intersecting, we use a time-reversal argument: it is the same as the 
probability that two random walks started at and ei never inter- 
sect during [0, At] and end up at x and y. That is, letting S and S' 
be these walks: 

FiE)^ F{SAt = x,S'^t = y,S[0,At]nS'[0,At]=(!>)pl 

= PtP{S[0, At] n 5'[0, At] = 0) 

But since the difference of two independent rate 1 simple random 
walks is rate 2 simple random walk started from ei, we see that the 
probability of the event in the right-hand side is the same as the 
probability that a random walk started from ei never returned to 
the origin up to time At. Since d > 3 and At is large, it follows 
that simple random walk is transient and thus this probability is 
approximately 7^. We conclude: 

^Pt ~ -7dPt- (136) 

Integrating this result gives Theorem 5.1. 

Naturally, there are various sources of error in this approximation, 
all of which must be controlled. For instance, one error made in this 
calculation is that the ancestors x and y are not necessarily unique. 
They are however likely to be unique if At is not too big. van den 
Berg and Kesten [25] have used this approach to obtain a density 
result for a modified model of coalescing random walks, where the 
method of Bramson and Griffeath, relying on the duality with the 
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voter model and the exact computation of Sawyer, completely col- 
lapses. However, they were able to obtain an asymptotic density 
result based on this heuristic (first in large enough dimension [25], 
and then for ah d > 3 [26]). □ 

5.1.2 Arratia's rescaling 

Soon after Bramson and Griffeath proved Theorem 5.1, Arratia con- 
sidered the more precise question of what can be said about the 
location of the particles that have survived up to time t. In order to 
be able to see a limiting point process, one has to rescale space so 
that the average number of particles in a cube of volume 1 is one, 
say. That is, we shrink each edge to a length of 

e := (137) 

and let 

Vt{dx) = 5(dx)l||g^^|. 
Arratia's remarkable result [8] is as follows: 

Theorem 5.2. Assume that d > 2. Then Vt converges weakly to 
a Poisson point process with intensity dx, the Lehesgue measure on 
W^, as t ^ CO. Ifd=l then there exists a nondegenerate limit which 
is non-Poissonian. 

Proof, (sketch) Arratia's proof is deceptively short, and we only 
sketch the idea of why this works: the reader is invited to consult 
[8] for the real details of the proof. The reason we get a Poissonian 
limit only in dimension 2 and higher is because it is possible to find 
a At such that At = o{t) but At is large enough that 

VAt > 1/e. 

Since e = gd{ty^'^ ~ l/{\/wt) in dimension d = 1, it is not possible 
to find such a At in dimension 1. However, taking for instance 
At = ^1/2+1/'^ in dimension d> 3 and At = t / \/log t for d = 2 works. 
Once this is the case, the idea is to say that if S is a fixed compact 
convex set of M"', with high probability there are no coalescences 
between times t — At and t within B. More precisely, if Vt denotes 
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the same Point process as V except that the coalescences are not 
ahowed during this time interval, then 

HT^Ab / Vt\B) ^0, as t ^ oo. (138) 

Indeed, note that, on the one hand, f/ always has more particles than 
r/, and on the other hand, by applying the Markov property at time 
s = t — At, so that if pt{x, y) denotes the transition probabihties of 
continuous time simple random walk on Z"^, and if we let r/t(x) = 



IE(r?t(x)) = ^ 'K{ris{y))pM{v,x) 
= Ps^ PAt{x,y) 



= Ps- 

Therefore, putting these two things together, we find, for x € Z*^: 

F{r]t{x) / f]tix)) < - -qtix)) 

< Ps -Pt, 

and it follows that 

H'PtlB / Vt\B) < Yl ^(^*(^) ^ ^*(^)) 

x£Z''n{^)B 

< 2X{B)^^^ ^ 0. 
Pt 

It follows from this that we can pretend (with high probability) that 
no coalescence occurred, in which case particles behave as if they 
were independent simple random walks. However, since VKi S> 
(1/e) (which is the typical size in the original lattice of the set B), 
it means that "particles have enough time to mix" and thus their 
locations are i.i.d. uniform in B. Since the mean number of particles 
in i? is 1, this can only mean that particles are distributed as a 
Poisson point process with unit intensity. 

This argument shows why the limit cannot be Poissonian in d = 1: 
particles meet and coalesce too often for them to have the time to 



Coalescent theory 



137 



get back to some sort of equilibrium density. Indeed, in dimension 
d = 1, the effect of negative correlations is so strong that it does 
not disappear even at large scales of space. The limiting object is a 
process known as Arratia's coalescing flow which shares similar prop- 
erties of the coalescing flow analysed in Theorem 3.15 for Kingman's 
coalescent. This is also intimately connected to an object called 
the Brownian web, which has been the subject of intense research 
recently. □ 



5.1.3 Voter model and super-Brownian limit 

To describe more precise questions connected with the geometry of 
the set of individuals that have coalesced by some time, it is useful to 
introduce a system of particles called the (multitype) Voter model. 
It turns out that this model is in duality with coalescing random 
walks on the one hand, and on the other hand, that there exists an 
invariance principle for this model (due to Cox, Durrett and Perkins 
[54]). That is, this model is known to have super-Brownian motion as 
its scaling limit. This invariance principle has been further sharpened 
by Bramson, Cox and Le Gall [41] who showed that the geometry 
of a single "patch" (i.e., the set of individuals that coalesced to a 
single particle currently located at the origin, provided that there is 
such a particle), has the geometry of the Super-Brownian excursion 
measure. 

We first explain the notion of duality with the voter model. The 
(multitype) voter model is a system of particle on Z'^ where each 
vertex is occupied by a certain opinion. This opinion may take two 
values (0 or 1) in the two-type case, but in the multitype case every 
individual initially has their own opinion. As time evolves, at rate 
1, any site x may infect a randomly chosen neighbour, say y: then 
y adopts the opinion that x currently holds. Thus it is convenient 
to label opinions by the vertices of Z"^. To say that vertex x at time 
t has the opinion y means that there was a chain of infections 
from y to X va. time t. (Thus x carries the opinion that individual 
y was carrying at time 0.) The duality between the two-type voter 
model and coalescing random walks ry states the following. Let W.^ 
denote the expectation for coalescing random walks r]t started from 
a set B (Z U^] and let denote the expectation for the two- type 
voter model started from a set A. (That is, initially x € A carries 
opinion 1, and everybody else carries opinion 0). Let denote the 
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set of 1 opinions at time t in this model: 

Theorem 5.3. Let A,Bc1j'^ be two subsets. Then we have the 
duality relation: 

PH^t 5 B\^o = A)= F^{7]t C A\i]o = B). (139) 

Proof, (sketch) The proof is most easily seen with a picture, as both 
processes can be constructed using a graphical representation. For 
each oriented edge e = (x, y) linking two neighbouring vertices, asso- 
ciate an independent Poisson clock {N^(t),t G R) with rate 1. This 
clock has two interpretations, depending on whether we wish to use 
it to construct coalescing random walks or the voter model. For the 
former, a ring of the clock signifies that x infects y, i.e., y adopts 
the opinion of x. For coalescing random walks, a ring of the edge 
means that a particle which was at x moves at y. This is shown in 
Figure 11. Fix < t. For s < t and two vertices x, y we say there is 
a path down from (t, y) to (s, x) if following the arrows emanating 
from y at time t leads to x at time s. Define a process (W^'^)o<s<t 
by putting Wg'^ = x if and only if there is a path down from (t, y) to 
{t — s,x). Then it is easy to see that {{Ws'^)o<s<t}y£z<i is a system 
of coalescing random walks run for time t. On the other hand, if 
AcZ'^ we may define for all t > 0, = {y G Z*^ : M^/'^ G A}. Then 
{Ct)t>o has the law of the two-type voter model started from = ^• 
Now note that, with this construction 

{6 ^B} = {Wt'^ G A for all y £ B} = C A} 

where r/t is the system of coalescing random walks defined by the 
processes {{Ws'^)o<s<t}y£B- This completes the proof. □ 

Observe that, with the voter model, we have a nice dynamics 
forward in time with a clear branching structure. The fact that 
the density of coalescing random walks is asymptotic to 1/ {^dt) in 
dimension d > 3 implies that the a given opinion at time in the 
multitype voter process has a probability about 1/(7^^) to survive 
up to time t. It is well-known that there is a similar behaviour 
for critical Gallon- Watson branching processes with finite variance 
([108], [102]): F{Zt > 0) ~ K/t, where K = 2/a'^. 

This suggests that, in dimensions d > 3, the branching structure 
of the voter model is well-approximated by a critical Gallon- Watson 
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Voter model 



Coalescing random walks 



Figure 11: The duality between coalescing random walks and the 
voter model. A dot indicates an edge which rings. Only the rings 
which affect the particles have been represented. 
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process. On top of this branching structure, particles are moving 
according to simple random walk: therefore, we do expect in the limit 
as t ^ oo that this system can be rescaled to the super Brownian 
motion, as this is precisely defined as the scaling limit of critical 
branching Galton- Watson process with Brownian displacements (see 
Etheridge [72] for a wonderful introduction to the subject). The 
invariance principle of Cox, Durrett and Perkins states this result 
formally: 

Theorem 5.4. Let denote a voter model started from a certain 
initial condition . Let d>2, and let tun = N ifd > 3 or N/ log 
if d = 2. If X^{dx) = Yjyaeii^ Sy{dx), then, provided =^ Xq 
as N ^ oo, we have: 

{Xlt„t>0) ^diXt,t>0) (140) 

the super- Brownian motion with branching rate r = and spatial 
variance = 1. 

We finish this discussion by stating the sharpened result of Bram- 
son, Cox and Le Gall [41], which describes the geometry of the patch 
It of the origin, conditionally on the event that there is a particle at 
the origin. 

Theorem 5.5. There is the following convergence in distribution as 
t ^ oo: ^ 

— > Supp(X) 

where X has the distribution of a super-Brownian excursion condi- 
tioned to reach level 1. 

(This convergence holds with respect to the Hausdorff metric on 
compact sets.) We haven't defined the super-Brownian excursion 
properly, nor super Brownian motion in fact. This is simply the 
rescaled limit of a single critical Galton- Watson tree with finite vari- 
ance, conditioned to reach a high level n, and with independent 
Brownian displacements along the tree. See [41] for details. 

5.1.4 Stepping stone and interacting diflfusions 

We now provide a very short and partial presentation of the stepping 
stone model of population genetics. Essentially, this is a spatial 
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version of the Moran model studied in Theorem 2.3, and it may 
also be viewed as a generalization of the voter model of the previous 
section. The model is as follows. Fix a graph G which for us will be 
always the d-dimensional Euclidean lattice or a large d-dimensional 
torus (Z/L)*^, and let > 1. We view each site as a colony or deme, 
and N is the total number of individuals of a given population at 
each colony. Here we only define the model without mutations, but 
it is easy to add mutations if desired. The stepping stone model 
tracks the evolution of various allelic types at each site x € Z'^, as 
they are passed along to descendants. Since this is a spatial version 
of the Moran model, individuals reproduce in continuous time at 
constant rate equal to 1, and when individual i reproduces, some 
other individual j adopts the type of individual i. We merely need to 
specify how j is chosen. This is done as follows: fix < < 1, which 
we think of being a small number. With probability l — u,j is chosen 
uniformly at random different from i but in the same colony as i, 
say X S Z"^. Otherwise, with probability u, j is chosen from another 
colony, with colony y being selected with probability q{y,x), where 
q{y, x) denotes the transition probabilities of a fixed random walk 
on Z"^. This choice is made so that when we follow the genealogical 
lineages of this model, backward in time, we obtain a random walk in 
continuous time with transition probabilities q{x,y). In what follows, 
the reader may think of the case where q{x, y) is not only symmetric 
in X, 2/ G (Z/L)'^ and a function only oiy — x, but also that it has the 
same symmetries as Z,'^. (Naturally, it is fine to think of the simple 
random walk transition probabilities: q{x,y) = {l/2d)l^^^yy, where 
X y denote that x and y are neighbours in the torus (Z/L)'^). 

The exact quantity which we track may depend on the context: for 
instance, by analogy with the Fleming- Viot model of Theorem 3.11, 
it may be convenient to start the model with all individuals carrying 
allelic types given by independent uniform random variables on (0,1), 
and follow the process 

Here £,i{t,x) G (0, 1) denotes the type of the i^^ individual at time t 
in colony x. The stepping stone model has a long history, which it 
would take much too long to describe here. We simply mention the 
three papers which had huge impact on the subject, starting with 
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the work of Kimura [103] and the subsequent analysis by Kimura 
and Weiss [104] and by Weiss and Kimura [151] . Durrett [66] devotes 
Chapter 5 of his book to this model. It contains a fine review of some 
recent results on this model due in particular to Cox and Durrett 
[53] as well as Zahle, Cox and Durrett [152]. We describe some of 
the main results below. For those results, what matters is only the 
time of coalescence and genealogical properties of the model (which 
is why it is not too important what is the exact quantity tracked by 
the stepping stone model). 

To start with, we describe some results regarding the time of co- 
alescence of two lineages. We take the case d = 2, which is not only 
the most biologically relevant but also the most interesting mathe- 
matically. The result depends rather sensitively on the relative order 
of magnitude of the starting locations of these lineages, the size L of 
the torus, and the size N of the population in each colony. To start 
with, assume that the two lineages are chosen uniformly at random 
from the torus. Let Tq be the time which the lineages need to find 
themselves at the same location, and let Iq be the total time they 
need to coalesce (thus to ^ ^O) almost surely). The following result 
is due to Cox and Durrett [53] . 

Theorem 5.6. Let d = 2. For all t > 0, as L ^ oo, and v 0, 
/ L^logL \ , 

Moreover, after Tq, the additional time needed to coalesce to — Tq 
satisfies: 

E(to - To) = NL\ 

Here a denotes the variance of q in some arbitrary coordinate 
(since q has the same symmetries as Z'^, it does not matter which 
one). 

Proof, (sketch) By considering the difference of the location of the 
two lineages, the question may be reformulated as follows: start from 
a location at random in the torus, and ask what is the hitting time 
of zero for a rate 2iy continuous time random walk {Xt,t > 0) with 
kernel q. By stationarity, the expected amount of time has spent 
at by time is exactly equal to 1. On the other hand, if Xq = 0, 
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the amount of time = in the next units of time is, by the 
local central limit theorem, 







27r(2i/cr2) 27rz/cr2 



since Po(Xt = 0) ~ (27r(T^(2z^i))~-^. Thus, on average the particle 
spends one unit of time at the origin by time L, but conditionally on 
hitting this becomes approximately log L/(27ri/(7^). It is not hard 
to deduce from these two facts that 

P To < L") ~ — — . 

logL 

Using a mixing argument (after a large constant times L^, the walk 
has mixed and forgotten its initial state, so there is a fresh chance to 
hit o in the next period of the same length) one can deduce the first 
result without too much difficulty. The second result is a much more 
general property of so-called symmetric matrix migration models, 
see Theorem 4.13 in [66]. □ 

There are thus two situations to consider for the asymptotics of 
to: either Iq — Tq dominates or To dominates. The first one happens 
if 

E(To) = 0{^^^^) « E{to - To) = NL^ 
i.e. if Nu/\ogL oo. In this case we obtain: 
Corollary 5.1. Assume that Nv/logL — ^ oo. Then as L ^ oo, 



an exponential random variable. 

The interesting case occurs of course if both contributions to to 
are of a comparable order of magnitude. Thus 

——— ^ a. 141 
logL 

Theorem 5.7. We have: 

to>(l + a)^^^t)-e-*. (142) 
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This is complemented by a result, due to [152], which says that 
the genalogy of a random sample of n individuals from the torus is 
approximately given by Kingman's coalescent, after a suitable time- 
change. Let hi = (1 + a)L^ log L/(27r(T^i'). The following result 
(Theorem 2 in [152]), is originally formulated for the number of lin- 
eages backward in time of such a sample, but can be reformulated 
in terms of convergence to Kingman's coalescent, which we do here. 
Let k > 1 and let (nf ''^^ t > 0) denote the ancestral partition process 
for these k individuals. 

Theorem 5.8. As L,N ^ oo and — > in such a way that (141) 
holds, then 

(n^;^„t>o)^(n^t>o), 

Kingman's k-coalescent. 

The proof of this result follows the lines of an argument due to Cox 
and Griffeath [55], who proved a similar result in the context of the 
voter model, or coalescing random walks. Naturally, the idea is to 
exploit the fact that particles are very well mixed on the torus by the 
time they coalesce, leading to the asymptotic Markovian property of 
the ancestral partition process, and to the fact that every pair of 
coalescence is equally likely. (The fact that only pairwise mergers 
occurs is also a consequence of the relative difficulty to coalesce in 
more than 1 dimension: we rarely see three particles close enough to 
coalesce instantly on the time scale that we are looking at). 

A particularly interesting case of this question arises when the two 
lineages are not selected just uniformly at random from the torus but 
from a subdomain of the torus which is a square of sidelength with 
< P < 1. This reflects the fact that, in many biological studies, 
samples come from a fairly small portion of the space (see [152] or 
section 5.3 of [66] for results). In that case Theorem 5.8 still holds 
but the time-change is slightly more complicated (and is not just 
linear, in particular). In the next section on spatial A-coalescents, 
an even more extreme view of individuals sampled from the exact 
same location is presented. 

Note that mutations may be added to the stepping stone model 
without difficulty (instead of adopting the type of his parent, a new- 
born adopts a new and never seen before type). Also, if we imag- 
ine following forward in time the evolution of the densities at var- 
ious sites of a certain subpopulation (say they are of type a, and 
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there are no mutations), then we can estabhsh a relation of duahty 
with certain interacting Wright-Fisher dijfusions. These diffusions 
{px{t),t > 0)^g2d are characterized by the infinite system of SDEs: 

dPx{t) = ^i^yiPyii) - Px{t))dt + ^p^it){l - p^{t))dW^{t) (143) 

where {Wx}x&'^ is a collection of independent Brownian motions. 
This duality is exactly the spatial analogue of Theorem 2.7 between 
Kingman's coalescent and the Wright-Fisher diffusion. (Note that 
existence and uniqueness of solutions to (143) is non trivial but fol- 
lows from the duality method). 

5.2 Spatial A-coalescents 
5.2.1 Definition 

When considering population models in which the geometric inter- 
action of individuals is taken into account, we are led to studying 
a process which was introduced by Limic and Sturm in 2006 [116], 
called the spatial A-coalescent. Loosely speaking, these models are 
obtained by considering the ancestral partition process associated 
with the stepping stone model of the previous section. However, 
there are two differences. On the one hand, the graph G will be the 
d-dimensional lattice 7/ rather than a torus, where d = 1,2 are the 
most relevant cases: for instance, one can think for d = 1 of a species 
which lives on an essentially one-dimensional coastline. More impor- 
tantly, there is an additional degree of generalisation compared to 
the stepping stone model. Strictly speaking, if we reverse the time 
in the stepping stone model (and speed up time by {N — l)/2, as in 
Theorem 2.3), the process which we will obtain is the spatial King- 
man coalescent: pairs of particles coalesce at rate 1 when they are 
on the same site, and otherwise perform independent random walks 
in continuous time. In the spatial A-coalescent, the mechanism of 
coalescence is such that it allows for multiple mergers at any site. 
More precisely, the model is defined as follows. Given a graph G and 
a set of particles: 

1. Particles follow the trajectory of independent simple random 
walks in continuous time on Z'^, with a fixed jump rate p > 0. 
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2. Particles that are on the same sites coalesce according to the 
dynamics of a A-coalescent. 

Definition 5.1. This model is the spatial A-coalescent of Limic and 
Sturm [116]. 

We content ourselves with this informal description of the process 
and refer the reader to [116] for a more rigorous one. It is non- 
trivial to check that the process is well-defined on an infinite graph, 
but this can be done using a graphical construction together with 
the Poissonian construcion of A-coalescents. We do not specify in 
this definition the initial configuration. In fact, it could be quite 
arbitrary: in particular, it is possible to start the process with an 
infinite number of particles on all sites of a given (possibly infinite) 
subset. This follows from the fact that there is a natural property 
of consistency which is inherited directly from the concistency of 
A-coalescents. 

A particular case of interest is, naturally, the spatial Kingman co- 
alescent, where particles perform independent simple random walks 
with constant jump rate p > and each pair of particles on the 
same site coalesces at rate 1. This is related to the genealogical tree 
associated with the stepping stone model. 

A first property of spatial A-coalescents is that if A is such that 
the A-coalescent comes down from infinity (i.e., if Grey's condition is 
satisfied or the condition in Theorem 3.7 holds), then at every time 
t > 0, there is only a finite number of particles on every site. Intu- 
itively, this is because coming down from infinity is a phenomenon 
that happens so close to t = that particles don't have the time to 
jump before it happens. Limic and Sturm actually showed a stronger 
statement than this, showing that coming down from infinity, inter- 
preted in the sense of a finite number of particles per site, always 
happens in a uniform way, independently of the graph structure: if 
B is any finite subset of the graph, let be the the first time when 
the number of particles in B is no more than k per site on average 
(i.e., no more than k\B\ particles in B). Then the following estimate 
holds: 

Theorem 5.9. Assume that ^'bL2%^ < oo so that the A-coalescent 
comes down from infinity. Then for any graph G, 
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5.2.2 Asymptotic study on the torus 

Limic and Sturm considered the situation of a spatial A-coalescent 
on a large torus of . They were able to show the following result: 
if (i > 3, as the torus increases to Z'^, the genealogy of an arbi- 
trary number of samples from the torus is well-approximated by a 
mean-field Kingman's coalescent run at a certain speed. This is the 
analogue of Theorem 5.8, except for the initial condition (as we now 
explain) . 

To state this result, let G be the total expected number of visits 
at the origin by a simple random walk in TL'^ started at the origin, 
that is, G = l/'^d where is the probability of not returning to the 
origin. Define 

Suppose that a fixed number of particles n is sampled from the 
torus of side- length 2 + 1 and that the initial location vi,...,Vn of 
these particles does not changed as — > oo (here the torus is viewed 
as a subset of 7/ with periodic boundary conditions). Because simple 
random walk is transient, it is possible to define unambiguously a 
partition 11" which is the eventual partition formed by running the 
dynamics of the spatial A-coalescent on Z'^. Thus i ~ j in 11" if 
particles started from Vi and Vj did coalesce at some point. (This 
partition is typically nontrivial because of transience!) 

Theorem 5.10. Let 11^'" denote the partition obtained from run- 
ning the dynamics of the spatial A-coalescent on the torus of side- 
length N for time t. Then if {Kt, t >0) is the (mean-field) Kingman 
coalescent of chapter 2, started from the partition IT" , then we have: 

iuf^;^^^,^,t>0)^iK,t,t>0). (145) 

This convergence holds as N ^ oo and in the sense of the Skorokhod 
topology for Vn- 

Proof, (sketch) The idea of the proof is the same as in Theorem 5.8 
and in [55]. We are working in higher dimension than 2 here, but 
that can only help mixing. Note that the definition of the term k 
in (144) precisely takes into account the effects of transience and 
mixing on the one hand, and coalesence on the other hand. This is 
why both G and A2 2 come up in this definition. That n depends 



Coalescent theory 



148 



only on A through A2,2 = ^([0, 1]) is to be expected, since typically 
when particles meet, the density is so low that there are only two 
particles at this site, and the other are far away. □ 

We point out that a version of Theorem 5.10 was first proved by 
Cox and Griffeath [55] for instantaneously coalescing random walks 
and by Greven, Limic and Winter for the spatial Kingman coalescent 
[93]. 

5.2.3 Global divergence 

While Theorem 5.9 tells us that for a A-coalescent which comes down 
from infinity, there are only a finite number of particles per site, what 
it does not tell us is whether a finite number of particles are left in 
total at any time t > 0. To consider an extreme case, imagine that 
the initial configuration at time is an infinite number of particles 
on the same site but no particle anywhere else. What is the total 
number of particles at time t > 0? It might happen that, even though 
there are only a finite number of particles per site, sufficiently many 
have escaped and they have instantly spread all over the lattice. The 
answer to these questions is provided in [6], which shows that global 
divergence is a universal rule, no matter what graph and underlying 
measure A. We start with the case of Kingman's coalescent. 

Theorem 5.11. Let he the number of particles at time t if ini- 
tially there are n particles at the origin in TH^ . Then there exists 
ci,C2 > such that with probability 1 as n ^ oo, 

ci(log* <Nt< C2(log* nf. (146) 

Here the function log* n is simply defined as the inverse log* n := 
inf{m > 1 : Tow(m) > n} of the tower function: 

e 

Tow(n) = eT°™("-^) := . (147) 

n times 

In words, start with a number n, and take the logarithm of this 
number iteratively until you reach a number smaller than 1. The 
number of iterations is log* n. In particular, log* n has an incredibly 
slow growth to infinity: for instance, if n = 10^^, (the number of 
particles in the universe) log* n = 4. Thus, whether or not you con- 
sider that the spatial Kingman coalescent comes down from infinity 
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or not is essentially a matter of taste! (In particular, if you are a 
biologist, you might consider that log* n = 3 for any practical n...) 

Since Kingman's coalescent is the fastest to come down from in- 
finity (Corollary 4.2) it follows that any spatial A-coalescent on 
is globally divergent. Since moreover Z is the smallest bi- infinite 
graph, we get: 

Theorem 5.12. Any spatial A-coalescent on any infinite graph is 
globally divergent (i.e., does not come down from infinity). 

This is in sharp contrast with the non-spatial case where coming 
down from infinity depends on the measure A. 

Proof, (of Theorem 5.11 - sketch). The basic idea is to first focus 
on the vertex at the origin, and investigate how many particles ever 
make it out of the origin. We may thus represent the coalescence of 
particles there as a tree and we put a mark on the tree to indicate 
that the corresponding particle has jumped. For the moment, ig- 
nore the behaviour of particles after they have left the origin, so for 
instance you may think that a particle that jumps is frozen immedi- 
ately. Note that, since jumps occur at constant rate p, the number 
of particles that ever leave the origin is exactly equal to the number 
of families in the allelic partition with mutation rate p. In particu- 
lar, this number is equal to K^, the number of blocks in a PD{9) 
random partition restricted to [n] with 6/2 = p. By Theorem 2.10 
(which is a simple consequence of the Chinese Restaurant Process) 
this number is approximately 9 log n. 

Moreover, some more precise computations show that most of the 
action happens in a very short span of time: roughly, the vast major- 
ity of the particles who are going to leave the origin have already done 
so by time l/(logn)^. By this time, particles which have jumped 
once have not had the time to jump any further or come back. Due 
to the fact that there are about 2/t blocks in Kingman's coalescent 
at a small time t, there are about 2(logn)^ particles left at the ori- 
gin, and about ui = {9 / {2d)) log n particles at each neighbour (by 
the law of large numbers). Starting from here, we can replicate this 
argument: about ^logni will ever leave all sites at distance 1 from 
the origin, so that in the next step, the sites at distance 2 from the 
origin receive about ?i2 := (9 /2d)logni ~ (0/2(i) loglogn particles. 
The argument can be iterated, and each time one wants to colonize 
a new site, a log has to be taken. This may go on until we run out 
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of particles, which happens after precisely log* n iterations. At this 
point, a ball of radius log* n has been colonized, which corresponds 
to a volume of about (log* n)'^. On each of those sites, using the 
work of Limic and Sturm for finite graphs (Theorem 5.9), at time 
t > there will about 0(1) particles on every site. Theorem 5.11 
follows after estimating the various errors made in this induction and 
showing they are negligible. □ 

Applying this reasoning to measures A with the "regular varia- 
tion" property of Definition 4.1 (such as the Beta distribution with 
parameters (2 — a, a)), gives us: 

Theorem 5.13. Let 1 < a < 2 and consider the spatial A-coalescent 
with regular variation of index a. Let be the number of particles 
at time t if initially there are n particles at the origin in U^. Then 
there exists Ci,C2 > such that with probability 1 as n ^ oo, 

ci log log n< Nl' < C2 log log n. (148) 
5.2.4 Long-time behaviour 

Consider the spatial Kingman coalescent, started from a large num- 
ber of particles at the same site. The results from the previous 
section tell us that the particles quickly settle to a situation where 
there is a bounded number of particles per site in a rather "large" 
region of space (large in quotation marks, as we have seen that after 
all log* n shouldn't be considered large for any practical purpose!). 
After this initial phase of decay, a different kind of behaviour kicks 
in, where the geometry of space plays a much more crucial role than 
before. Particles start diffusing and coalescence events become much 
more rare than before. The behaviour is then altogether rather sim- 
ilar to the case of instantaneously coalescing random walks: indeed, 
when two particles meet, they stand a decent chance of coalescing. 
Thus, starting from a situation where a ball of radius m = log* n 
has about one particle per site (or m = log log n for spatial beta- 
coalescents), we can expect the density to start decaying like 1/i as 
in Theorem 5.1 for a while. When particles start realising that the 
initial condition wasn't infinite (i.e., there wasn't one particle per 
site on every site of the lattice but only a finite portion of size ap- 
proximately m), the density essentially stops decaying. This takes 
place at a time of order because of the diffusivity of particles. At 
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this time, the density is about l/m?' and particles are distributed in 
a volume of about m'^. Thus, in dimensions d > 3, we expect about 
m'^~^ particles to survive forever. This heuristics is confirmed by the 
following result: 

Theorem 5.14. Let N^o be the number of particles that survive 
forever, if initially there are n particles at the origin, and let m = 
log* n. There exist some constants c > and C > (depending only 
on d) such that, if d>3: 



In dimension 2, since simple random walk is almost surely recur- 
rent, every pair of particle is bound to meet infinitely often and thus 
to coalesce. Hence only one particle may survive forever. However, 
the heuristics above can be adapted to show what is the asymptotic 
rate of decay of the number of particles. As anticipated, the correct 
time-scale is of order m^: 

Theorem 5.15. Let d = 2 and let Nt be the number of particles 
that survive up to time t > 0. If 6 > 0, there exist some universal 
constants c\ and C2 such that 



There are two statements hidden in (150). The first one says that 
the number of particles at time 5m'^ is about log m. The second says 
that if 6 is large, then the constant term in front of log m is of the 
order of 1/ log (5). 

Proof. We are not going to offer any justification of Theorem 5.14 
or 5.15, but we will give a rigorous argument for a lower-bound on 
the expected number of survivors. This argument has the merit of 
making it clear that this is indeed the correct order of magnitude. 
The idea is the following: in the ball of radius m initially, label all 
particles 1,2, ... , Km'^ for some -fC > 0, in some independent ran- 
domized way (for instance, uniformly at random). To every particle 
labelled 1 < i < V , with V = Km'^, associate an independent contin- 
uous time simple random walk 5* started at Xi, the initial position 
of particle i. In the event of a coalescence between particles with 
labels i < j, we decree that the lower-labelled particle necessarily 




(149) 




(150) 
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wins: after this event, both particles i and j fohow the trajectory of 
the walk S\ 

Fix e > and let P be the population of particles with labels 
1 < i < em^. It suffices to show that a typical member of P stands 
a nonzero (asymptotically) chance to survive forever. However, a 
particle i £ P may only disappear if it coalesces with a particle 
with a lower label than itself, and hence it can only disappear if 
it coalesces with another member j from P in particular. Let us 
assume for instance that this particle is sitting at the origin initially. 
If j G -P is at position x G Z'^ initially, the probability that it ever 
is going to meet 5* is smaller or equal to the expected number of 
visits to the origin by — S^, which is equal to the Green function 
G{x) of simple random walk. Now, it is standard that: 

G{x) ^ c\x\'^-'^ (151) 

Hence, the expected number of coalescences K{i) of i with other 
members of P is smaller than 

HK{i)) < P(there is a particle from P at x) 
X P(the two particles coalesce) 

< cy k'^-^^-^^e-'^ 

fc=o 

m 
fc=0 

= Ce. 

The probability that i disappears is smaller than the expected num- 
ber of coalescences with members from P, thus the probability of 
survival is greater than 1 — Ce. By making e sufficiently small, 
this is greater than 1/2, and we conclude that the expected num- 
ber of particles that survive forever is thus at least greater or equal 
to {1/2)^P = {e/2)m'^~^ . Some more precise computations on the 
dependence between these events for various particles i and a mar- 
tingale argument are enough to conclude for the lower-bound in The- 
orem 5.14. 

The upper-bound is on the other hand much more delicate, and the 
argument of [6] uses a multiscale analysis to show that the density 
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decays in a roughly inversely proportional way to time in > 3. The 
key difficulty is to control particles that may escape to unpopulated 
regions and thereby slowing coalescence. The multiscale approach 
allows to bound the probability of such events at every stage. □ 

5.3 Some continuous models 

At this stage the understanding of spatial A-coalescents is quite 
rough and it would be of considerable interest to establish more 
precise results about the distribution of the location of the particles 
in the manner of, say, Theorem 5.4. The first step is to identify 
the effective branching rate in the time-reversed picture. There are 
several possible ways to do this. One of them is to get some inspira- 
tion from a remarkable study by Hammond and Rezkhanlou [96] of 
a model of coalescing Brownian motions. 

5.3.1 A model of coalescing Brownian motions 

In this model, particles are performing a Brownian motion in M.'^ 
with d > 3 say, although they have also studied the case d = 2 
in a separate paper [97] (see also [137] for a related model where 
masses may be continuous). Two particles may interact when they 
find themselves at a distance of order e of one another, where e > 0. 
One way to describe this interaction is that there is an exponential 
clock with rate for every pair of particles, such that when the 
total time spent by this pair at distances less than e exceeds that 
clock, then the two particles coalesce, and are replaced by a single 
particle at a "nearby location" (usually somewhere on the line seg- 
ment which joins the two particles for simplicity, although this isn't 
so important). In this model, the diffusivity of particles depends also 
on the size of the block that it corresponds to: thus there is a func- 
tion d{n) > which is usually non-increasing such that a particle of 
mass n (i.e., made up of n particles having coalesced together) has 
a diffusivity of d{n). This models the physically reasonable situa- 
tion where larger particles don't diffuse as fast as light particles. We 
are also given a function a{n, m) which represents the microscopic 
propensity for particles of masses n and m to coalesce, thus the rate 
for the exponential clock is chosen to be e'^a{n,m). 

Hammond and Rezakhanlou are able to prove in [96], subject to 
certain conditions on the functions d{n), that the density of parti- 
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cles rescales to an infinite hierarchy of PDEs called Smoluchowski 's 
equations. The initial number of particles N is chosen so that every 
particle coalesces with a bounded number of particles in a unit time- 
interval. A simple calculation based on the volume of the Wiener 
sausage of radius e around the trajectory of a single Brownian mo- 
tions shows that this happens if Ne^^'^ = Z is constant. With 
these conventions, their result is as follows. Let g^{dx,t) denote 
the rescaled empirical distribution of particles of mass n at time t 
and at position x, that is, is the point measure: 

gl{dx,t) = e^-'^ ^xMx) 

where V{t) is the set of particles alive at time t and Xj is the location 
of particle i. Then the main result of [96] is as follows. 

Theorem 5.16. For any test smooth test function J{x), and any 
n > 1, and for all t > 0, as e ^ 0: 

J{x)gn{dx,t) ^ / J{x)fn{x,t)dx 



where the functions fn{x,t) satisfy: 

^ = ^ A/„(x, t) + QUm^, t) - QUf)ix, t), (152) 
where Qi and Q2 are given by 

n 

Q'l{f) = Y.P{k,n-k)fkfn-k 

k=l 

and 

n 

Q2(/) = 2/nJ^/3(n,fc)A. 

fc=i 

The numbers P{n, m) are given in terms of a certain PDE which 
involves a{n,m): first find the solution to 

cxin m) 

^Un,m = X ,' . X (1 + Un,m), in B{0, 1) 

d[n) + d[m) 

with zero boundary condition on the unit sphere dB{0, 1), and then 

d{n) + dim) Jb(o,i) 
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The equation (152) has an intuitive interpretation: roughly, /„ 
changes because of motion (which gives the Laplacian term), or 
because of coalescence of two particles of size k and n — k, where 
1 < k < n, (giving the first term Qi) or because of coalescence of a 
particle of mass n with a particle of possibly different mass k > 1 
(this is a loss in this case, and is responsible for the term Q2)- It 
should be emphasized that proving existence and uniqueness of so- 
lutions to (152) can be extremely difficult, essentially due to the 
fact that some nontrivial gel may form (i.e., creation of particles of 
infinite mass in finite time). This is an old problem, and one that 
Hammond and Rezkhanlou have also partly contributed to clarify in 
subsequent papers. 

One of the remarkable features of this result is that the macro- 
scopic coagulation rates P{n,m) differ from the microscopic ones 
a{n,m). This reflects the fact that a kind of macroscopic averaging 
occurs and there is an "effective rate of coalescence", which takes 
into account how much do particles effectively see each other when 
they diffuse and may coalesce with others. 

The model of spatial A-coalescents may be viewed as a lattice 
approximation of the model of Hammond and Rezakhanlou in the 
particular case where the diffusivity d{n) does not depend on n. 
In that case, the hierarchy of PDE's (152) simplifies greatly and 
becomes simply: 

f = ^A/-f/^ (154) 

for some /3 > 0. Thus it is tempting to make the conjecture that the 
equation (154) also describes the hydrodynamic limit of the density 
of particles at time t and at position x in spatial A-coalescents. The 
number /? is the solution to a certain discrete difference equation 
which is the discrete analogue of (153). 

5.3.2 A coalescent process in continuous space 

The fact that the number /3 in (154) does not agree with (153) is 
rather disconcerting. It is an indication that, even after taking the 
hydrodynamic limit, the discrete nature of the interactions and the 
exact microscopic structure of the lattice on which these interactions 
take place, play an essential role in the macroscopic behaviour of the 
system. This makes it doubtful thats such models should be taken 
too seriously for modeling real populations. Instead, it is natural to 
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ask for models that are more robust or universal, just the same way as 
Brownian motion is a vahd approximation of many discrete systems, 
irrespectively of their exact microscopic properties. Moreover, the 
kind of models discussed above (spatial A-coalescents and the model 
of coalescing Brownian motions of Hammond and Rezakhanlou) fall 
roughly in the same kind of models as that of coalescing random 
walks, and thus we also anticipate that the genealogical relation- 
ships they describe is similar in some way to that of super-Brownian 
motion. It should however be pointed out that super-Brownian mo- 
tion, although a rich source of mathematical problems in their own 
right, is rather inadequate as a model of populations living in a con- 
tinuum. We refer the reader to the discussion in the introduction of 
[22] for such reasons. The difficulty is that they predict that if not 
extinct, at large times, the population will form 'clumps' of arbitrar- 
ily large density and extent, which goes against the intuition that 
some kind of equilibrium is settling in. 

To circumvent this fact, Etheridge has recently introduced in [73] 
a new model of coalescence in continuous space, which is based on 
a Poisson point process of events in a roughly analogous fashion to 
the Poissonian construction of A-coalescents. Suppose that we are 
given a measure ^{dr,du) on M_|_ x (0, 1). The measure dx ® ^ on 
X X (0, 1) indicates the rate at which a proportion u € (0, 1) of 
lineages in a ball of radius r around any given point x coalesces. The 
location of the newly formed particle is then chosen either uniformly 
in the ball of radius r around x, or precisely at x. The analysis of 
this process is only at its very initial stage at the moment. To start 
with, even the existence of the model does not appear completely 
trivial: one needs some conditions on ji which guarantee that not 
too many events are happening in a given compact set: for instance, 
the trajectory of a given particle will be a Levy process and one set 
of conditions on jjL comes from there. Another set is purely anal- 
ogous to the A-coalescent condition. Conditions for the existence 
of the process and some of its properties are analysed in [10], who 
rely on a modification of a result due to Evans [76]. Among other 
things, they study scaling limits of this process when space is no 
longer the full plane but rather a large two-dimensional torus, and 
the measure ^-{dr, du) may be decomposed as a sum of two measures 
corresponding to "big events" which involve a large portion of the 
space (such as a major ecological catastrophe) and a measure for 
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small (local) events. They show that there is a large spectrum of 
coalescing processes which may be obtained in the limit to describe 
the genealogies of the population. This contrasts with the spatial 
A-coalescent model of Limic and Sturm where the scaling limit is 
always Kingman's coalescent, regardless of the measure A (Theorem 
5.10). 

We note that a model of a discrete population evolving in contin- 
uous space is described in the paper [22], where the main result is 
that for certain parameter values the process is ergodic in any di- 
mension. (As noted above, this contrasts sharply with population 
models based on super-Brownian motion). It is believed that the co- 
alescent process associated with this model converges in the scaling 
limit to Etheridge's process. See [73, 22, 23] for details. 
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6 Spin glass models and coalescents 

In this chapter we take a look at some developments which relate a 
certain description of spin glass models to a coalescent process known 
as the Bolthausen-Sznitman coalescent. This is the A-coalescent 
process which arises when one takes A to be simply the uniform 
measure on (0,1). We first introduce a beautiful representation of 
this process in terms of certain discrete random trees which was 
discovered by Goldschmidt and Martin [90], and use this description 
to prove some properties of this process. We then introduce the 
famous Sherrington-Kirkpatrick model from spin glass theory, as well 
as the simplification suggested by Derrida known as the Generalized 
Random Energy Model (GREM). We describe Bovier and Kurkova's 
result [40] that the Bolthausen-Sznitman coalescent describes the 
statistics of the GREM as well as Bertoin and Le Gall's connection 
[29] to Neveu's continuous branching process. Finally, we describe 
some recent outstanding conjectures by Brunet and Derrida [46, 47] 
which are related to this and to several other subjects such as random 
travelling waves and population models with selection, together with 
ongoing work in this direction. 

6.1 The Bolthausen-Sznitman coalescent 

Definition 6.1. The Bolthausen-Sznitman {Ilt,t > 0) is the V- 
valued A-coalescent process obtained by taking the measure A to be 
the uniform measure A{dx) = dx. 

Thus the transition rates of the Bolthausen-Sznitman coalescent 
are computed as follows: for every 2 < k < b, and for every n > 2, if 
the restriction of IT to [n] has b blocks exactly, then any given /c-tuple 
of blocks coalesces with rate 



6.1.1 Random recursive trees 

We follow the approach of Goldschmidt and Martin [90] which shows 
a representation of the Bolthausen-Sznitman coalescent in terms of 
certain random trees called recursive trees. 
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Definition 6.2. A recursive tree on [n] is a labelled tree with n 
vertices such that the label of the root is 1, and the label of vertices 
along any non-backtracking path starting from the root is monotone 
increasing. 

In other words, the label of a child is greater than the label of 
the parent. There are exactly (n — 1)! recursive trees on [n]: indeed, 
suppose that a tree of size 1 < j < n — 1 has been constructed. 
Then vertex with label j + 1 can be added as the child of any of the 
j vertices already present in the tree. It follows directly from this 
description that a randomly chosen recursive tree (i.e., a recursive 
tree chosen uniformly at random among the (n — 1)! possibilities) can 
be obtained by a variation of the above procedure: namely, having 
chosen a randomly chosen recursive tree on [j] with 1 < j < n — 1, 
one obtains a random recursive tree on j + 1 by attaching the vertex 
with label j + 1 uniformly at random at any of the j vertices of the 
tree. 

There is a natural operation on recursive trees which is that of 
lifting an edge of the tree. This means the following: assume that the 
edge e = (ii,i2) which is being lifted connects two labels ii < i2, so 
that ii is closer to the root than i2. Let Z2, is, . . . , be the collection 
of labels in the subtree below ii. Then this subtree is deleted and 
the label of ii becomes ii,. . . Graphically, all vertices below ii 
bubble up to ii and stay there (see Figure 12 for an example). 

Goldschmidt and Martin [90] use the word cutting for the oper- 
ation just described, but we prefer the word lifting as it is more 
suggestive: we have in mind that the subtree below the edge is given 
a lift up to that vertex. 

This leads us to the slightly more general definition of recursive 
tree. Let vr = (i?i, . . . , i?^) be a partition of [n], with the usual 
convention that blocks are ordered by their least element. 

Definition 6.3. A recursive tree on n is a labelled tree with k ver- 
tices such that all k vertices of the tree are labelled by a block of 
the partition. The label of the root is Bi, and the label of vertices 
along any non-backtracking path starting from the root is monotone 
increasing for the block order. 

Given a partition vr = (Bi, . . . , Bf^) there are naturally exactly 
{k — 1)! possible recursive trees on vr, and the previous notion of 
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Figure 12: Lifting of successive edges of a recursive tree on 10 ver- 
tices. 
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recursive trees corresponds to the case where vr is the trivial partition 
into singletons. 

With these definitions, we are able to state the following result, 
which is due to Goldschmidt and Martin [90] , and which gives a strik- 
ing construction of the Bolthausen-Sznitman coalescent in terms of 
random recursive trees. Let n > 1, and let T be a random uni- 
form recursive tree on [n]. Endow each edge e of T with an inde- 
pendent exponential random variable Te with mean 1, and use this 
random variable Te to lift the edge e at time Tg. The label set of the 
trees defined by these successive liftings define a random partition 

(n"(t),t>o). 

Theorem 6.1. The process {ir^{t),t > 0) has the same distribution 
as the restriction to [n] of the Bolthausen-Sznitman coalescent. 

Proof. The proof is not very difficult. The main lemma, which in 
[90] is generously attributed to Meir and Moon [119, 120], is the 
following: (note that it is not literally the same as the one given in 
[90], where slightly more is proved). 

Lemma 6.1. Let L be a given label set with b elements, and let 
T be a random recursive tree on L. Let e be an edge of T picked 
uniformly at random, independently of T, and let T' be the recursive 
tree obtained by lifting the edge e. Then, conditionally on the label 
set V of T' , T' is a uniform random recursive tree on L' . 

Proof. Fix a label set L' = I' such that L is more refined than L', i.e., 
L' has been obtained from L by coalescing certain blocks, and let t' 
be a given recursive tree on L' . Let us compute P(T' = t']L' = i'). 
Since both L and L' are given, we know which blocks of L exactly 
must have coalesced. Let us call L = {ii, . . . the labels of T 
ordered naturally, and let {ii-^ , . . . , ^j^.} be those labels that coalesce. 
Thus let Ml = {ii^, . . . and let M2 = L \ Mi. Let us consider 

the various ways in which the event {T' = t'} may occur. First 
build a recursive tree ti on Mi rooted at (there are (k — 2)\ ways 
of doing so), and consider the recursive tree ^2 on M2 obtained by 
changing the label ■ ■ - ii^} of t' into ii^. Link to by an 
edge e. The tree T must have been the one obtained by the junction 
of ti and t2 (which has probability exactly 1/(6 — 1)!), and the edge 
e linking ii-^ to ii^ must be lifted (which has probability 1/(6 — 1)). 
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Thus 

In particular, after division by P(L' = i'), the above does not depend 
on t' and thus T' is a uniform random recursive tree on L'. □ 

It is now a simple game to conclude that the process 11" (t) has 
the Markov property and the transition rates of the Bolthausen- 
Sznitman coalescent. Indeed, the lemma above shows that con- 
ditionally on n"'(t), the tree T^{t) is a uniform random recursive 
tree labelled by 11" (t) , and moreover since there are 6—1 edges in 
the tree, the total rate at which a merger of k given blocks occurs 
(say , . . . , among the h blocks ^i, . . . , 4) is exactly 6—1 times 
P(L' = ^') using the notations in the above lemma. This may be 
computed directly as in the lemma, as all that is left to do is choose 
one of the (6 — h)\ recursive trees on M2, so 

= = - 2)Kfe -k)\ _ 1 {k- 2)\{b - k)\ 



1)(6-1)! (6-1)2 (5_2)! 
'6-2' 



(6-1)2 yj, 



(157) 



(Another way to obtain (157) is that since T' is uniform conditionally 
given L' , and sice we already know that there are (b — k)l recursive 
trees on L' , we conclude from (156) that P(L' = i') is {b — k)l times 
the right-hand side of (156), which is precisely (157).) 

Thus multiplying (157) by (6 — 1), the rate at which (^j^, . . . 
is merging is exactly the same as A^^fc for the Bolthausen-Sznitman 
coalescent (155). This finishes the proof. □ 



6.1.2 Properties 

Theorem 6.1 allows us to prove very simply a number of interest- 
ing properties about the Bolthausen-Sznitman coalescent, some of 
which were already discovered by Pitman [131] although using more 
involved arguments. We start with the following result: 

Theorem 6.2. Let {Ilt,t > 0) be the Bolthausen-Sznitman coales- 
cent. Then for every t > 0, 



Ut = PD{e-\0) 
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has the Poisson-Dirichlet distribution with a = e * and 6 = 0. In 
particular, 11 does not comes down from infinity. 

Proof. The proof is simple from the construction of Theorem 6.1: 
all we have to do is observe that the Chinese Restaurant Process 
is embedded in the construction of random recursive trees. Indeed, 
let ii^i , . . . , be independent exponential random variables with mean 
1. Fix a time t > and imagine constructing a random recursive 
tree T on [n] by adding the vertices one at a time. We also put a 
mark on the edge which links vertex i to the root if and only if 
Ei < t. We interpret a mark as saying that the edge has been lifted 
prior to time i, but rather than collapsing it we keep it in place 
and simply keep in mind that we have to do the lifting operation 
in order to obtain n"(t). Suppose that after collapsing those edges, 
we would have k blocks of respective sizes ni, . . . , n^, with smallest 
elements ii, . . . ,ik. When vertex n + 1 arrives, it forms a new block 
if it attaches to one of zi, . . . , and if its edge isn't marked, which 
has probability ke~^/n. Otherwise, it becomes part of a block of size 
Uj if it attaches to one of the Uj — 1 vertices below ij (regardless of 
whether its edge has a mark) or if it is attached to ij and its edge 
is marked. The probability this happens is 

nj — 1 + (1 — e~*) Uj — a 
n n 

where a = e~*. Thus has the Poisson-Dirichlet PD{e~^ ,0) distri- 
bution. One can deduce from this and Theorem 1.11 that the number 
of blocks at time t in 11" (t) is approximately with a = e~*. Since 
this tends to oo, this proves Theorem 6.2. □ 

Another striking application of Theorem 6.1 is the following de- 
scription for the frequency of a size-biased picked block. That is, 
consider F{t) the asymptotic frequency of the block containing 1 in 
Ht, where {Ilt,t ^ 0) is the Bolthausen-Sznitman coalescent. 

Theorem 6.3. The distribution of F{t) is the Beta{l — a,a) distri- 
bution, where a = e~*. Moreover, we have the following identity in 
distribution for the process {F{t),t > 0): 

{F{t),t> 0} ^ > o| , (158) 
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where (7(5), s > 0) is the Gamma suhordinator, i.e., the process 
with independent stationary increments such that P(7(s) G dx) = 
T{s)-^x''-^e-''dx. 

Note that the right-hand side of (158) is Markovian, hence so is the 
process {F{t),t > 0). There is no obvious reason why this should be 
the case, and in fact we do not know of any other example where this 
is the case. Another consequence of this fact is that — log(l — F{t)) 
has independent (but not stationary) increments. 

Proof. It is easier to think of a random recursive tree on the label set 
{0, . . . , n} with thus n + 1 vertices. Then note that as we build the 
random recursive tree on this vertices, the partition P„ of {1, . . . , n} 
obtained by looking which vertices are in the same component of the 
tree if we were to cut all edges connected to the root 0, is exactly a 
Chinese Restaurant Process but this time with with parameters a = 
and 6 = 1. Thus it has the same distribution as the one induced by 
random permutations. It follows that in the limit, these normalized 
component sizes have precisely the PD{0, 1) distribution. Now, use 
the construction of Theorem 6.1 on n + 1 vertices with edges marked 
as in the proof of Theorem 6.2, to see that the ranked jumps of F{t), 
say Ji > J2 > . . ., are precisely given by the ranked components of 
a PD{0, 1) random variable. This sequence of jumps is furthermore 
independent of the corresponding jump times (Ti, ...,), which are by 
construction independent exponential random variables with mean 
1. It is fairly simple to see that these three properties and the Poisson 
construction of PD{0, 1) partitions (see the remark after Theorem 
1.7) imply Theorem 6.3. □ 

Analysing in greater details the probabilistic structure of random 
recursive trees (which turns out to involve some intriguing number 
theoretic expansions), Goldschmist and Martin are able to obtain 
some refined estimates on the limiting behaviour of the Bolthausen- 
Sznitman coalescent restricted to [n] and close to the final coagula- 
tion time. We discuss a few of those results. 

The following says that the sum of the masses M„ of the blocks 
not containing 1 in the final coalescence of (11" {t),t > 0) , is approx- 
imately n^, where U is a uniform random variable. More precisely: 
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Theorem 6.4. Let Mn he as above and let Bn be the number of 
blocks involved in the last coalescence event. Then 



where U is a uniform random variables, (Yf,t > 0) is a standard Yule 
process, E is a standard exponential random variable and Y, U, E are 
independent. 

The Yule process is a discrete Galton- Watson process which branches 
in continuous time at rate 1 and leaves exactly two offsprings. The 
convergence of the second term in the left-hand side indicates that 
there is a nondegenerate limit for the number of blocks in the last 
coalescence event. One can similarly ask about the number of blocks 
involved in the next to last coalescence, and so on. Let (M„(l), . . . , ) 
be this sequence of random variables, i.e., M„(i) is the number of 
blocks involved in the i^^ coalescence event from the end. Gold- 
schmidt and Martin [90] show that all this sequence converges for 
finite-distributions towards a nondegenerate Markov chain. This 
Markov chain converges to infinity almost surely. They interpret 
this last result as a post-gelation phase where most of the mass has 
already coagulated and the remaining small blocks are progressively 
being absorbed. 

Along the same lines, they obtain a result concerning the time at 
which the last coalescence occurs. Naturally, this time diverges to 
oo since 11 does not come down from infinity, and Goldschmidt and 
Martin establish the following asymptotics: 

Theorem 6.5. LetTn be the time of the last coalescence event. Then 



where E is an exponential random variable with mean 1. 

This means that the order of magnitude for the last coalescence 
time is about log log ?i. This could have been anticipated from the 
fact that, by Theorem 6.2 and Theorem 1.11, the number of blocks 
at time t is about n", with a = e~*. This becomes of order 1 when 
t is of order log log n. 

Finally, a result of Panholzer [127] (see also Theorem 2.4 in [90]) 
about the number of cuts needed to isolate the root in a random 
recursive tree implies the following result. 




Tn — log log n — > — log E 
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Theorem 6.6. Let Tn be the total number of coalescence events of 
a Bolthausen-Sznitman coalescent process started from n particles. 
Then 

^ 1 (159) 

n 

in probability, as n — > cx). 
6.1.3 Further properties 

Many other properties of the Bolthausen-Sznitman have been stud- 
ied intensively. The time reversal of the Bolthausen-Sznitman co- 
alescent is studied by Basdevant [11] and is shown to be an inho- 
mogeneous fragmentation process after an exponential time change. 
A similar idea was already present in the seminal paper of Pitman 
[131]. In fact, this process is closely related to the "Poisson cascade" 
introduced even earlier by Ruelle [138], and it was Bolthausen and 
Sznitman [37] who noticed that an exponential time change trans- 
formed the process into a remarkable coalescent process. Pitman 
later realised that this coalescent was an example of the coalescents 
with multiple collisions which he was considering. 

The allelic partition of the Bolthausen-Sznitman coalescent was 
studied by Basdevant and Goldschmidt [12], using an elegant mar- 
tingale argument which fits in the theory of fluid limits developed 
by Darling and Norris [58] . They were able to show that if there is 
a constant mutation rate p > 0, then almost all types are singletons, 
meaning that they are represented in only one individual (or that 
their multiplicity is 1). More precisely, they showed: 

Theorem 6.7. Let Mk{n) denote the number of types with multi- 
plicity k in the Bolthausen-Sznitman coalescent, and let M{n) be the 
total number of types. Then as n —> oo, 

log"- ^ 
n 

in probability, and for k > 2, 
(logn)^ 



Mfc(n) 



n k{k — 1) ' 

as n —> oo in probability. 
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A similar result was first proved by Drmota et al. [69] for the to- 
tal length of the coalescence tree, rather than the number of types. 
The biological interpretation of this result is less clear, since we do 
not have any evidence that this coalescent process is appropriate 
for modelling the genealogies of any species. However, it is believed 
that the Bolthausen-Sznitman coalescent describes a universal scal- 
ing limit for certain model with high selection, as will be discussed 
below. 

6.2 Spin glass models 
6.2.1 Derrida's GREM 

We start by a heuristic description of the model invented by Derrida 
known as the GREM (for generalized random energy model). The 
first version of the model was introduced in [60] , and this was gen- 
eralized in [61], to incorporate several energy levels. This idea was 
followed up by Bovier and Kurkova in the form of the Continuous 
Random Energy Model (CREM), which is the version we now dis- 
cuss. We start by stating the problem and give the result of Bovier 
and Kurkova [40] about this model, which is followed by a brief de- 
scription of some of the ingredients in the proof. We then explain 
the relation to the Sherrington-Kirkpatrick model. 

The model is as follows. Let > 1 and consider the A'-dimensional 
hypercube Sn = {— l, 1}^- An element a £ Sn is a spin configu- 
ration, i.e., an assignment of ±1 spins to 1, . . . , A^. We identify S]sr 
with the A^*'^ level T^v of the binary tree T as follows: if a G Sn, then 
a may be written as a sequence of —1, +1, say a = cJi . . . cr„, and 
we interpret this sequence as describing the path from the root of 
the binary tree to the vertex a at the A^**^ level of the tree: the first 
vertex is the root, the second is the left child of the root if ci = —1, 
and the right child of the root if ci = The second vertex in this 
path is the left child of the preceding vertex if (T2 = —1, and its right 
child if (72 = +1, and so on. 

Given two spin configurations a and r, there is a natural distance 
between them, which is the genealogical metric: 

d{a, r) = 1 - max{l < i < N : a-i = rj. (160) 

Thus for < e < 1, the distance between a and r is less than e if 
the paths from the root to a and r are identical up to level {1 — £)N. 



Coalescent theory 



168 



In other words, the distance d{a, r) is 1 minus the normaUzed level 
of the most recent common ancestor between a and r. We then 
assume that we are given a function A : [0, 1] — > [0, 1] which is 
nondecr easing, such that A{0) = and ^4(1) = 1. Consider now 
a centered Gaussian field (Xo-,cr G Sn) which is specified by the 
following covariance structure: 

cov{X„, Xr) = A{1- d{a,T)). (161) 

Thus with this definition, note that spin configurations a and r that 
are closely related genealogically are also highly correlated for the 
Gaussian field X. On the other hand, for spin configurations whose 
most recent common ancestor is close to the root of the tree, then the 
values of the field at these two configurations are nearly independent. 

In the GREM, one fixes a parameter /3 > and consider the Gibbs 
distribution with inverse temperature f3 defined as follows: 

= (162) 

where Z is a normalizing (random) constant chosen so that l^p{<^) = 
1 almost surely. Thus the Gibbs distribution favours the spin con- 
figurations such that X(j is large. Now, consider sampling k spin 
configurations fii , . . . , cjfc independently according to the Gibbs dis- 
tribution. A natural question is to ask what is the genealogical 
structure spanned by these spin configurations, i.e., what is the law 
of the subtree of T obtained by joining cii, . . . , cjfc to the root. The 
next result, which is due to Bovier and Kurkova [40], shows that this 
is, up to a time-change, asymptotically the same as the Bolthausen- 
Sznitman coalescent. 

More precisely, let n^(t) be the partition of [A;] defined by: i ~ j 
if and only (i((Tj, cjj) < t. Then we have the following result. 

Theorem 6.8. Let {@tit ^ 0) denote the restriction to [k] of the 
Bolthausen-Sznitman coalescent. Then the process (n^(t),t > 0) 
converges in the sense of finite- dimensional distributions as N ^ oo 
to the process (G(— log /(I — t)), < t < 1), where for < x < 1, 

and A denotes the the least concave majorant of A, and A'{x) indi- 
cates the right- derivative of A. 
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A is also known as the convex hull of A, since it is the function 
such that the region under the graph of A is the convex hull of the 
region under the graph of A: see Figure 13. The case where A 




Figure 13: The convex hull of the function A. 

takes only finitely many values effectively corresponds to the model 
discussed by Derrida (in the terminology of the spin glass literature, 
this represents finitely many energy levels), while the case where 
A contains a continuous part is the "Continuous Random Energy 
Model" analysed by Bovier and Kurkova. In what follows we will 
sketch a proof of this result in the case of a finite number of energy 
levels (in fact, with only two energy levels to simplify things). 

Note that in Theorem 6.8, the Bolthausen-Sznitman coalescent 
arises regardless of (3 and A. The dependence on /3 and A is only 
through the time change. However, there are some degenerate cases. 
For example, if 



2 log 2 



lim^loA'{x) 



then f{x) = 1 for all x, and the Bolthausen-Sznitman coalescent 
gets evaluated at time zero, so there are no coalescence. In the 
physics language, /3 is inverse temperature and 1/Pc is the critical 
temperature, above which there is no coalescence because we are not 
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sampling enough from the values a for which X^j is large. 
6.2.2 Some extreme value theory 

The first ingredient is a basic result from "extreme value theory". 
To begin with the simplest case, suppose Xi,X2, ■ ■ ■ are i.i.d. with 
a standard normal distribution, and let M„ = maxjXi, . . . , 
Following Exercise 2.3 in [65] or Exercise 4.2.1 in [133], choose bn so 
that F{Xi > bn) = l/n. Then 6.„ ~ ^2 log n and for all x E M, 

lim P(fen(M„ - bn) <x) = e-''~\ (163) 

n— >oo 

This is the famous result that the distribution of the maximum of n 
normally distributed random variables has asymptotically a Gumbel 
(double exponential) distribution. Furthermore, because the random 
variables Xi are independent, one can see from (163) that the ex- 
pected number of the random variables X\^ . . . , Xn that are greater 
than bn + is e~^, and that the distribution of the number of 
such random variables should be Poisson. More precisely, one can 
view the set of Xi as a point process on the real line, and we can 
obtain a nontrivial limit by setting the origin to be where we roughly 
expect the maximum to be, i.e., bn- Indeed, as n — > oo, we have the 
convergence of point processes: 

n 

1=1 

where 7^ is a Poisson process with intensity e~^. A version of this 
result is stated as Theorem 9.2.3 in [39]. We call V the exponential 
Poisson process. The exponential Poisson process enjoys several re- 
markable and crucial properties which we now describe. Let {Tk\k>i 
be the points of a uniform rate 1 Poisson process on [0, cx)), and let 

= log(l/Tfc). 

1. {^'fc}fc>i forms an exponential Poisson process on M. 

2. For /? > 0, and c > 0, the points {9^ = ce^*'-'}fc>i form a 
Poisson point process of intensity I3~^c^/^x~^~^/^ on [0,oo). 

3. The points {^^ -|- Yk}k>i-, where the are i.i.d. with density 
5, form a Poisson process with intensity /i(x), where 

/oo /'OO 
e-'g{x-z)dx= / eJ'-^5(2/)(i2/ = e~^E[e^'=]. 
-oo J — oo 
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Thus iiV = logE[e^'], they are a translated exponential Pois- 
son process, with the new origin taken to be equal to V. 

4. If we superimpose independent Poisson processes, where the 
ith has intensity e~^^~^^\ the resulting Poisson process has 
intensity f{x), where 

oo oo 
i=l i=l 

where ^ = log(^^ie^O • 
6.2.3 Sketch of proof 

We are now ready to discuss a sketch of the proof of Theorem 6.8. 
We assume that A has finitely many energy levels, i.e., A is the dis- 
tribution function of a probability measure with n atoms at positions 
< xi, . . . , < Xn say, with respective masses oi, . . . , o^- Thus we as- 
sume that ^ • fli = 1 and that > for all 1 < i < n. Without loss 
of generality we may assume that x„ = 1 and we let xq = 0. 

We slightly change our notations for a spin configuration a G Sjsf. 
we now write it as o" = di . . . o"n, where ai £ S]\f(^xi-Xi-i)^ <7i 
consists of N{xi — Xi-i) spins. (Here we do not worry about the fact 
that N{xi — Xi^i) is not necessarily an integer). Then it is easy to 
see that the random field Xq- on may be explicitly constructed 
as follows: for all 1 < k < n, and for all di, . . . , 0"^, let Xo-i,,,o-j. be 
i.i.d. standard Gaussian random variables. Then define Xq- to be 

= \/aiX^j + yfa^X^^a^ + . . . V^^<7i-CT„- (165) 

Indeed one can check directly from the above formula that has 
the correct covariance structure (and it is naturally a Gaussian field, 
being a linear combinations of i.i.d. standard Gaussian random vari- 
ables) . 

Assume to simplify that n = 2, so that < xi < X2 = 1. For 
a\ G S^xx ■, Ist r\fy^ = ^f^^/TTa^Xa-^ _ gy extreme value theory, there is a 
constant such that the points biy{Xf^-^ — bj^) converge to a Pois- 
son process with intensity e~^. Also, because there are 2^^^ of the 
random variables X^j, we have ^at ~ y^2 log 2^^i = (2 log 2)A^2;i. 
Therefore, the points 
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form approximately a Poisson process with intensity C^x ^/l^^^ 
where Cn > depends on N, ai and xi but not on x, and 

bn V (21og2)xi 

Similarly, for each fixed cJi G Snxi and all a2 € <Sn{x2-xi)j let rjo-j^„^ = 
g/3v^x^i,2_ Then 

By extreme value theory again, the points 7/0-10-2 form approximately 
a Poisson process with intensity C^x"^"-^/^^, where 



(21og2)(x2-xi) 

It follows that for each cii, the points r/oi'?o-icr2 form a Poisson process 
of intensity C^r/o{^^x~^~^/'^2. Therefore, if we consider all points of 
the form ri(j-^rj„^„^ = g/^v^^tr ^ they form a Poisson process with inten- 
sity Ax~^~^/I^'^ , where A = C^C'^ li'Jf'^, and once we condition 
on this entire Poisson process, the probability that a given point 
sampled from the Gibbs distribution /-i/3(o") = Z~^e^'^'' belongs to 

the "family" associated with a particular ai is proportional to ?7o{^^ • 

This gives us the following picture for the genealogy of the process. 
First, we sample n of the values Xo with probability proportional to 
^PVnXc _ g^pg liijgly to sample the same point more than once: in 
fact, as discussed above, sampling according to e^^-^'^ is approxi- 
mately the same as sampling from a Poisson point process V with 
intensity Ax~^~^^^^ with weight proportional to x. If we identify the 
samples which come from identical points, this gives us an exchange- 
able partition IIq where the frequency of the block corresponding to 
the point x G is proportional to x. Thus the distribution of the 
ranked frequencies of this exchangeable partition is given by the 
ranked components of 

' i> 1 I (167) 



and since V has intensity proportional to x"^"^/^^, we conclude by 
the Poisson construction of Poisson-Dirichlet (a, 0) partitions (The- 
orem 1.7) that the vector (167) has the same distribution as the 
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ranked coordinates as a Poisson-Dirichlet random variable with pa- 
rameters a = 1//32 and ^ = 0. Thus Ho = PD(1/ (32,0), and thus 
n^(0) has approximately the same distribution as the restriction to 
[k] of a PD{\/ (32) random variable. 

Going back to the previous level, note that a given sample x G V 
chosen with weight proportional to its value x, comes from the "fam- 
ily" generated by ai with probability proportional to rjai^^ , which 
is a Poisson process with intensity proportional to x~^~f^^^^^ . Thus 
if we sample from V and identify the points that come from the 
same ui, we obtain an exchangeable partition Hi whose ranked fre- 
quencies have the same distribution as those of a Poisson-Dirichlet 
random variable with parameters a = P2/P1 and 9 = 0. Thus 
Ui = PD{p2/Pi,0). 

Thus taking t = {x2 — xi), we obtain 11% {t) by taking every 
block of n^(0) (which is a PD{\ / (32, 0) random variable restricted to 
[k]), and coagulate them according to a PD{(32/ (3i,0) random vari- 
able. We claim that the resulting random partition is nothing but a 
PD(l//3i,0) random variable. There are many ways to see this: one 
of them being precisely using the fact that the Bolthausen-Sznitman 
coalescent at time t has the PD{e~^ ,0) distribution (Theorem 6.2). 
Indeed, by the Markov property for the Bolthausen-Sznitman coa- 
lescent at time t, we see that when we coagulate a PD{e~^ ,0) par- 
tition with an independent PD{e~^ ,0) partition, we must obtain a 
PD{e~^^~^'^\0) random partition. 

Thus we can write for t = t2 = 0, and t = ti = X2 — xi, 11^ (tj) ^ 
PD{l/(3i) with i = 1,2 where 

^ ^ 1 / (21og2)(a;, 
(3i (3\l a, 

_ 1 / 2 log 2 

A'{l-U) 
= /(l_t^) = e-{-iog/{i-tO). 

Thus for i = 1,2, we have shown that 11^ (tj) has the same distri- 
bution as 0_iog/(i_f.), as claimed in Theorem 6.8. Note that this 
argument doesn't really explain how do lineages coalescence between 
the different energy levels, and this is why we only get convergence 
in the sense of finite-dimensional marginals in Theorem 6.8. 
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6.3 Complements 

6.3.1 Neveu's branching process 

The intuitive picture presented here essentiahy goes back to the work 
of RueUe [138] who talks about probabihty cascades for the proper- 
ties of the exponential Poisson process. Bolthausen and Sznitman 
[37] then realised that reversing the direction of time defined the re- 
markable coalescent process which now bears their names. Bertoin 
and Le Gall [29], in their first joint paper on coalescence, showed that 
the Bolthausen-Sznitman coalescent process was embedded in the 
genealogy of a certain continuous-state branching process (CSBP), 
which is the CSBP associated with the branching mechanism 

= ulogu, u>0. 

This CSBP is known as Neveu's branching process. This was the first 
paper showing a relation between the genealogy of a CSBP and a 
A-coalescent, and was a partial motivation to the papers [36, 20, 18]. 
However, in the case of Neveu's branching process, the relation be- 
tween the genealogy and the coalescent is trivial, in the sense that 
there is no time-change. Bertoin and Le Gall's original approach re- 
lied on a precursor to their flow of bridges discussed in Theorem 3.14. 
The ideas outlined in Theorem 4.9, which come from [18], provide 
a direct alternative route (more precisely, the approach of Theorem 
4.10 shows that the point process {t,AZ/Z) arising from the ge- 
nealogy of Neveu's branching process and the Bolthausen-Sznitman 
coalescent are identical). That Neveu's branching process was re- 
lated to Derrida's GREM was first realized by Neveu in [126], in a 
paper which is unfortunately unpublished, even though in hindsight 
it inspired many subsequent developments in the field. The link with 
extreme value theory is also discussed in that paper. 

6.3.2 Sherrington-Kirkpatrick model 

The Generalized random energy model (GREM) was proposed by 
Derrida in [60] and [61] as a possible simplification of the celebrated 
Sherrington-Kirkpatrick model. The Sherrington-Kirkpatrick (SK) 
spin-glass model is similar to the GREM, with the difference being 
that we use the Hamming distance 
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also known as the overlap between a and r, where ai and Tj denote 
the ith coordinates of a and r respectively. (Also, the SK model 
typically refers to the case A{x) = x, but other covariance functions 
have also been studied.) Here d^r is a metric but not an ultrametric. 
Because (In is not an ultrametric, it is not clear that it even makes 
sense to define a coalescent process as was done for the GREM. How- 
ever, it is widely conjectured that if we consider k points ai, . . . ,ak 
chosen at random from Sn according to the Gibbs measure, the dis- 
tances between them d]y{ai, aj) have the ultrametric property in the 
limit as ^ oo, which means that they can be viewed as points on 
the boundary of a tree equipped with the genealogical metric. Ta- 
lagrand devotes section 4 of [148] to "the ultrametricity conjecture" 
for the Sherrington-Kirkpatrick model and refers to ultrametricity as 
"one of the most famous predictions about spin glasses." Derrida's 
insight consisted in imposing the ultrametricity directly in the model 
and analyzing what comes out of it. Remarkably enough, this simple 
addition makes the model much more tractable and fits the physi- 
cists' predictions about the SK model perfectly. See the monograph 
by Bovier [39] for much material related to this field, and see also 
the lectures by Bolthausen in [38]. The ultrametric conjecture was 
first predicted by Parisi [128]. We note however that an important 
prediction which follows from the ultrametric conjecture is a series 
of identities which have been proved rigorously by Ghirlando and 
Guerra [88] (in a slightly weaker form than predicted), known as the 
Ghirlando-Guerra identities. 

Much of the magic of the emergence of the Bolthausen-Sznitman 
coalescent in these spin glass models boils down to the crucial stabil- 
ity properties of the exponential Poisson process (by superposition, 
addition of noise, etc.). It is natural to guess that this process is, 
in some sense, the only point process which enjoys these properties. 
While this is an attractive route to the ultrametric conjecture, we 
note that this seems a very difficult problem. We refer the reader to 
the recent work by Aizenmann and Arguin [1] as well as references 
therein. 

6.3.3 Natural selection and travelling waves 

As was discussed in the proof of the Bovier-Kurkova theorem, Der- 
rida's GREM may be viewed as an assignment of Gaussian random 
variables on the leaves of the binary tree of depth N with a covariance 
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structure which depends on the genealogical metric between these 
leaves. There is one natural model where such correlation structures 
arise, which is the model of branching random walks where the step 
distribution is a standard Gaussian random variable, and where at 
each step, individuals branch in exactly two particles. That is, start 
with one particle at time 0. At each time step, particles divide in 
two and take i.i.d. jumps given by a prescribed distribution (which 
here is Gaussian). Thus at time A^, there are 2^ particles, whose 
respective positions rescaled by ^fN form a centered Gaussian field 
Xfj with covariance cov(Xo-, A,-) given by the following formula: if 
the most recent common ancestor between particle labeled a and 
particle labeled r is at generation j, corresponding to a position Sj, 
then there exists independent Gaussian variables N and N' such 
that = N~y^{Sj +AA) and Xr = N-^^iSj +Af'), so: 

cov{X^,Xr) = E{X^Xr) = ^mSj +M){Sj +AA')] 

In particular, we may write cov{X„, X^) = A(l — d{a,T)) with 
A{x) = X. Thus Gaussian branching random walks give a natural 
construction of a random energy landscape of the kind considered 
in the random energy model. Unfortunately, this is a degenerate 
case from the point of view of the application of Theorem 6.8, as 
A'{x) = 1 for all x G [0,1]. Nevertheless, we get out of this simple 
calculation that the energy landscape defined in the GREM may be 
viewed as a form of perturbation of branching random walks, with a 
rather complex covariance structure. Theorem 6.8 then asks about 
the genealogy of this system of particles. 

Recently, Brunet, Derrida, Mueller and Munier [46, 47] have intro- 
duced a particle system of this kind and made fascinating predictions 
about its genealogy. Rather remarkably, this model also has an in- 
terpretation in terms of a population model with selection, which we 
now describe. As in the Moran model, the population size is kept 
constant equal to A^. An individual is represented by her fitness, 
which is a real number measuring the likelihood that this individ- 
ual will produce offsprings surviving in the next generation. Thus, 
the population at time t may be described by a cloud of A^ points 
Xi{t), . . . , Xpf{t) on the real line, ordered in some arbitrary fashion, 
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Figure 14: The two steps of the Brunet Derrida model with = 4: 
at time t + 1/2, state of the population after the branching step 
(there are 2N individuals). At time t + 1, state of the population 
after the selection step. Only the N largest particles survive. 

say linearly. The model has discrete generations and is Markovian. 
The evolution from one generation t to the next at time t + 1 con- 
sists in two steps: branching and selection. Thus we have an inter- 
mediate state, which we may call t + 1/2, where every individual 
gives a number of offsprings (let us fix this number to be equal to 
2 for every individual, although one may think of a random rule as 
well). The position of the offsprings of individual i are denoted by 
Xl{t + l/2),Xf{t + 1/2) and are obtained by: 

Xf(^t + ^^=X,{t)+M\ 

where M^jAf^ are independent random variables with a fixed con- 
tinuous distribution (say Gaussian). At this stage there are thus 2A^ 
individuals, and so the next step, which is the selection step, will 
reduce the population size to by keeping the largest N particles 
from the population at time t + 1/2. Formally, for 1 < A: < we put 
Xk{t + 1) = Y such that 

#{i : Xlit + 1/2) >Y} + #{i : Xf{t + 1/2) >Y} = k-l. 

An illustration of the model is given in the accompanying Figure 14: 
note the similarity with the Gallon- Watson model of Schweinsberg 
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in Theorem 3.8 (the difference being that here selection is based on 
fitness, whereas there selection was made at random). 

The interest of [46, 47] is in the genealogy of an arbitrarily large 
but fixed sample of the population when its size tends to infinity, 
i.e., in the scaling limits of the ancestral partition process (11^'^ ,t > 
0) (to use the same terminology as in the first sections of these notes) . 
Using convincing but not fully rigorous arguments, they are able 
to conjecture that the correct time scale for the ancestral partition 
process is roughly (logA^)^. More precisely, they conjecture: 

Conjecture 6.1. The ancestral partition process, sped up by a factor 
(logA^)^, converges to the Bolthausen-Sznitman coalescent. That is, 
for all k > 1, 

K'iogiv)3>*>)-(n^t>o) 

in the sense of finite- dimensional distributions, where H'' denotes the 
restriction to [k] of a Bolthausen-Sznitman coalescent. 

This conjecture is accompanied with a very precise picture of what 
leads to this behaviour. Essentially, the cloud of particle is thought 
to travel to the right with a positive speed vj\[ where uat — > 2 as 
— > oo (and there are some conjectures on the first and second 
correction terms). The particles stay fairly compact, with a width 
of no more than 0(log A^) at any time. Occasionally (every (log A^)^ 
units of time), a particle travels far to the right, at distance approx- 
imately 3 log log A^ + 0(1) away from the "bulk" of the population. 
A particle which does so will will stand a good chance to keep all its 
offsprings in the next generation after the selection step, and so its 
descendants quickly generate a large fraction of the population, say 
p > 0. This leads to a p-merger in the ancestral partition process. 
Thus the multiple collisions only arise when one takes the scaling 
limit, speeding up time by (log A^)'^. 

The derivation of the characteristic time scale comes from an ar- 
gument of comparison with a stochastic PDE called the stochastic 
Fisher-KPP equation (for Kolmogorov, Petrovsky and Piscunov), 
which has the following form: 

— = -An + u{l -u) + ey/u{l - u)W (168) 

where VF is a white noise. If one removes the noise from this equa- 
tion (i.e. if e = 0), one obtains the standard Fisher-KPP equation. 
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which is at the heart of the theory of reaction-diffusion partial dif- 
ferential equations. This equation was first obtained independently 
by Kolmogorov et al. [109] and Fisher [84], the latter to describe the 
spread of an advantageous gene in a population. It is known that 
this equation admits travelling wave solutions, i.e., solution of the 
form u{t, x) = F[x — vt) where v > 0. For certain well-chosen initial 
conditions, the speed of this wave will always be equal to v = 2. 
The idea of [46] is that the distribution function for the population 
at time t behaves approximately as a solution to (168) started from 
the state u{0,x) = 1|^<q|. In the presence of noise, the equation 
(168) generates random travelling waves, which move to the right 
with a speed Vs such that ^ = 2 as e — > 0. The asymptotic 
correction Ve — v was studied by Brunei and Derrida [43, 44, 45] 
using non-rigorous methods. They conjectured: 



7r2 



Ve-Vr^-— 2- (169) 

4 log e 

and [46, 47] predicted a second term 

7r2 31og|loge| 

Ve - V o— ~ —rr. PT- (170) 

Alog^e 4|loge|3 ^ ^ 

Recently, Mueller, Mytnik and Quastel [125] managed to prove rig- 
orously (169) and give upper and lower bounds matching (170) up 
to constants. As the reader has surely guessed, it is this second 
term (with cubic exponent in | loge|) which is the most relevant for 
Conjecture 6.1. 

We note that Berard [14], and Berard and Gouere [15], have re- 
cently studied a discrete version of the Brunei and Derrida model 
(with particles' locations on Z rather than M, and selection at random 
in case of a tie), and were able to show that for each N, the system 
of particles travels at a well-defined speed vn- Furthermore, the sec- 
ond paper [15] showed that vat — ~ — a(logA^)~^ as ^ oo, for 
some explicit a > depending solely on the step distribution. This 
improved on the earlier paper [14] which showed that (log A)~^ was 
the correct order of magnitude for the correction to the speed. This 
result relied crucially on some recent progress by Gantert, Hu and 
Shi [87] on the near-critical behaviour of branching random walks. 

Simon and Derrida [147] have considered a model branching Brow- 
nian motion with an absorbing wall and critical drift. They showed 
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(using non-rigorous arguments) that when conditioned upon survival 
for a long time, this system has a genealogy which is also governed by 
the Bolthausen-Sznitman asymptotics as in Conjecture 6.1. Brunei, 
Derrida and Simon [48] have also used this theory to describe certain 
mean- field models of random polymers in (1+1) dimensions at zero 
temperature and found a similar behaviour, thus confirming further 
the universal nature of the Bolthausen-Sznitman coalescent. The 
work in progress [21] partly confirms these findings for some related 
models. 
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A Appendix: Excursions and Random Trees 

What follows is a crash course on some deep ideas due essentially to 
Aldous, Le Gall and Le Jan in the 90's which relate excursions of 
random processes (above or below a fixed level) to some random trees 
which enjoy certain branching properties and in which branching 
occurs at a dense set of times (or levels). The archetypical example 
is Aldous' Continuum Random Tree and its relation to the Brownian 
excursion and the Ray-Knight theorem on the local times of reflecting 
Brownian motions. We start by recalling the fundamentals of Ito's 
excursion theory for Brownian motion as this formalism is central 
to the study of continuum random trees. We then briefly explain 
the relation between random trees and random paths, and finally 
explain how these trees are related to the genealogy of CSBPs and 
the lookdown process. 

A.l Excursion theory for Brownian motion 

Let {Bt,t > 0) be a one-dimensional standard Brownian motion. 
The excursion theory of Brownian motion is one of the best tools 
to study fine properties of B. However, the basic idea behind the 
theory is extremely simple. We call an excursion e of the Brownian 
motion B, a process {e{t),t > 0) such that there exists L < R with 

e(i) = B^^L+t)AR 

and for t G [L, R], Bt = if and only ii t = L or R. That is, e is the 
piece of B between times L and R, which are two consecutive zeros 
of B. The state space of excursions is Q,* the space of continuous 
functions from M to M such that there exists C > satisfying: 

1. = if t > C 

2. For t£ [0, C], et = if and only if t = or (• 

^ is called the lifetime of the excursion or its length. The basic 
idea behind the theory is that one can construct Brownian motion 
by "throwing down independent excursion" and concatenating them. 
The result should be, indeed, a Brownian path. 

Of course, it is a little tricky to make this intuition rigorous at 
first, but it turns out that we can use the language of Point process 
to express this idea: we will view the collection of excursions of 
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Brownian motion as a Poisson point process on the set 0,*, and the 
intensity of the process is a measure called ltd 's excursion measure. 
However to say this properly, we must look at excursions in the 
correct time-scale, that is, the time-scale at which we are "adding 
a new excursion". This time-scale is that of the inverse local time 
process, since local time increases precisely at times when the process 
hits zero, and thus begins a new excursion. We will refresh the 
reader's memory about these notions below. 

A. 1.1 Local times 

It is well-known that Brownian motion spends an amount of time 
which has zero Lebesgue measure at any given point: for instance, 
if T = Jq Ij^Q^^Qjds then by Fubini's theorem 



and so r = almost surely. In fact this argument obviously gener- 
alizes to sets A such that A has zero Lebesgue measure: let T{A) be 
the time spent by Brownian motion up to time t in any given Borel 
subset of the real line, then if \A\ = T{A) = 0. Since T(A) is easily 
seen to be a (random) measure, we get immediately, by the Radon- 
Nikodym theorem, that there exists almost surely a derivative T{A) 
with respect to the Lebesgue measure dx: 

Definition A.l. We set 



almost surely. L{t, x) is called the local time of B at time t and 
position X (or level x). 

This definition is nice because it is quite intuitive, but is not very 
satisfactory because of the almost sure in this definition: this only 
defines L{t,x) for fixed t almost surely and almost everywhere in x. 
It turns out that 

Proposition A.l. There exists almost surely a jointly continuous 
process {L{t, x)}t>o^xeR for which (171) holds for all t simultane- 
ously. 





(171) 
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This can be seen through Kolmogorov's continuity criterion. Be- 
cause it is the Radon-Nikodym derivative of the occupation measure 
T{A), and because it is continuous in x and t, there are a couple of 
properties that fohow immediately. The most useful is the approxi- 
mation: 

Theorem A.l. For every t > 0, as e ^ 0, we have the following 
almost sure convergence: 



We generally focus on level x = 0, in which case we almost always 
abbreviate Lt = L{t,0). From the approximation (172), it follows 
that is a nondecreasing function, and may only increase at times 
t such that Bt = 0. That is, let dLt be the Stieltjes measure defined 
by the nondecreasing function t ^ Lt, then 



where Z is the zero set of B. Both sides of (173) are closed sets, so 
it is natural to conjecture that there is in fact equality. This turns 
out to be true but it requires some non-trivial arguments. In fact, 
the proof relies on a famous identity due to Paul Levy, which states 
the following: 

Theorem A. 2. For every t > 0, denote by St = maxs<( Bg, the run- 
ning maximum of Brownian motion. Then {Lt,t > 0) and {St,t > 
0), have the same distribution as processes. 

This identity is in fact more general than this: the identity stated 
above may be viewed as an identification (the maximum of B is the 
local time of a different Brownian motion B'), and in this identifi- 
cation St — Bt is equal to the reflected Brownian motion \B'\. Thus 
we have the bivariate identity: 



To save time, we do not give the proof of this result even though 
it is in fact quite elementary. (Most proofs in textbooks such as 
[136] use the so-called Skorokhod equation, but in fact, the identity 
may already be seen at the discrete level of simple random walks 
approximating Brownian motion). 




(172) 




(173) 



{{St, St - Bt),t> 0} = {{Lt, \Bt\),t> 0}. 



(174) 
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Armed with this result, it is easy to prove equality in (173). What 
is needed is to show that almost surely, at every time t > such 
that Bt = then Lt increases. Start with t = 0: then since L has 
the same law as S, which increases almost surely right after t = 0, 
then so does L, and so this property holds for t = 0. By the Markov 
property, it is not too hard to see it is also true at any time t such 
that t = ds for some fixed s > {ds is the first zero after s, so 
dg = ini Z n [s,oo)). Playing around with the fact that rational 
numbers are dense finishes the proof, and so we get 

Supp(dLt) = Z (175) 

almost surely. 

Our view of local times in these notes is purely utilitarian: even 
though they deserve much study in themselves, we will only stick to 
what we strictly need here. For our purpose the last thing to define 
is thus the inverse local time: for any i > 0, define 

Ti := inf{t >0: Lt> i}. (176) 

Ti is thus the first time that B accumulates more than i units of local 
time at 0. Thus is a stopping time, and Levy's identity (Theorem 
A.2) tells us that 

(r^,^>0) = (r,,x>0) (177) 

where Tx is the hitting time of level x by B. In particular, (r^, £ > 0) 
has independent and stationary increments, and is nondecreasing: 
that is, {ti,£ > 0) is a subordinator. Moreover, it is not hard to see 
that in fact r is the stable subordinator with index a = 1/2 (this 
follows simply from the reflection principle and the law of St). That 
is, the Levy measure of r has density 

M ^-"-'- (178) 
|r(i - a)\ 



A. 1.2 Excursion theory 

We will now state Ito's theorems about excursions of Brownian mo- 
tion, which make rigorous the intuition explained above. First, a 
remark: by (175), we see that if e is an excursion of B, correspond- 
ing to the interval [L, R], then the local time of B is constant on that 
interval, since by deflnition there are no zeros during (L, R). Thus if 
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(ej)i > 1 is an enumeration of the Brownian excursions (something 
which it is possible to do since there are as many as jumps of a cer- 
tain subordinator, and these are countable), then we cah call £i the 
common local time of the excursion Cj, that is, the local time Lt at 
any time t G (Lj, Ri) which is associated to e^. 

Theorem A. 3. There exists a a -finite measure v on the space of 
excursions 0*, such that the point process: 



is a Poisson point process, with intensity d£ i'{de). 
Definition A. 2. u is called Ito's excursion measure. 

For instance, the number of excursions by time with length 
greater than some is a Poisson random variable, with mean > 
Co)- Another consequence is, for example, that the quantity of local 
time accumulated by time Tx (the hitting time of x > 0) is an expo- 
nential random variable, with parameter k{x) := z^(supj,>Qes > x): 
indeed, in the local time scale, the number of points that fall in 
the set of excursions that hit level x, is a Poisson process with con- 
stant intensity equal to k{x). Thus the first point is exponentially 
distributed with parameter k{x) as well. 

Thus, in order, to be useful, this theorem should be accompanied 
with some descriptions of Ito's excursion measure. First of all, the 
Ito measure of excursion of length greater than can be identi- 
fied through (178), since the jumps of ti are precisely the excursion 
lengths. Thus from (178) we get the first description in the result 
below: 

Theorem A. 4. We have, for every x > 0.' 



i>l 




1 



(179) 




Moreover, if H = sup g-^QCs, then 



u{H > h) 



1 



(180) 



2h 
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Proof. It is easy to convince yourself that in (180), the right-hand 
side should be n/h for some k > 0. Indeed, fix some < x < h. On 
the one hand, the number of excursions that reach h by time ri is 
Poisson with mean say On the other hand, this is a thinning of 

the number of excursions that reach x, which is also Poisson but with 
mean k{x). The thinning probability is nothing but the probability 
that, given that an excursion reaches level x, it will also reach level 
h. However, it is plain to see that an excursion, given that it reaches 
X, behaves after as a Brownian motion killed at 0. Thus the 
thinning probability is 



for all < X < /i. Thus k{x) is equal to k/x for all x S (0, /i) (for 
some K > 0). Since h is arbitrary, k{x) = k/x for all x > 0. That 
K = 1/2 requires more work but is classical: see, e.g., (2.10) in Chap- 
ter XII of [136]. Note that the answer (with the correct value of k) 
can also be guessed from a discrete argument: at each visit of 0, the 
probability that the next excursion will reach Nx is (l/2)l/(A^x). 
(The first 1/2 comes from asking for positive excursion, and the sec- 
ond term is the familiar ruin probability estimate). At the N^^ visit, 
the total number of excursion that reach Nx is thus approximately 
a Poisson random variable with mean k{x) = l/(2x) as — > oo. □ 

One thing to pay attention to in (180) is that we do not count 
negative excursions in this random variable H. That is, u{H > h) 
measures only those positive excursions that reach level h. There is 
an obvious symmetry property in zv, so if instead we want to ask what 
is the measure of excursions that reach h or —h (which we often do 
when we think about reflecting Brownian motion), then this measure 
is now 1/h instead of l/(2/i). 

A. 2 Continuum Random Trees 

After rushing through local times and excursion theory, we now pro- 
pose another impressionistic rendering of the theory of Continuum 
Random Trees: that is, how to construct them, and how they are 
related to Brownian excursions. 



p = p^(Tfc<ro) = -. 



Hence we deduce: 



K{h) 
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A. 2.1 Galton- Watson Trees and Random Walks 

The theory starts with a well-known observation that a (not neces- 
sarily random) rooted labelled tree may be described by a certain 
path, sometimes called the Lukasiewicz path of the tree. This path 
provides us with a convenient way of proving things about trees (as 
we will see that this path is a close cousin of random walk when the 
underlying tree is a Galton- Watson tree) but it is also very conve- 
nient from a purely practical point of view: this path is indeed a 
variant of the depth-first search process which is used in any algo- 
rithm dealing with trees and graphs in general. 

First a few definitions: given a finite rooted labelled planar tree T, 
there is a unique way of labelling the tree in "lexicographical order" . 
That is, the first vertex is the root, = 0. We then list the children 
of the root, from left to right (this is why we require planarity). 
These children are called ui = 1, U2 = 2, . . . , Ur = r, say. We now 
go to the next generation, and attach to each vertex in the second 
generation a string of two characters (numbers) which is defined as 
follows: if that vertex is the rg*^ child of the rf^ individual in the 
first generation, we attach the string rir2- More generally, to any 
vertex in the n^^ generation, we attach of a string of n characters, 
ri . . . r„, which specify the path that leads to this vertex: hence, to 
find the vertex whose label is u = ri...r„, at generation 1, find 
the rf^ individual. In the next generation, find the r^ child of that 
individual, and so on. This way of labelling all the vertices of the 
tree is called the canonical labelling, of a planar labelled rooted tree. 
We may moreover list these vertices in lexicographical order (i.e., 
as if placing them in a dictionary). This gives us a list of vertices 
{uq,ui, . . . ,Up^i). Note that this list entirely specifies the tree; its 
length is the total size of the tree. 

There is a natural way to encode this data into a path: simply, 
as you go through the list (no, . • • , Up-i) (in lexicographical order), 
record the height of the vertex you're at. The height is just the 
generation or the level of the vertex: hence, a vertex in the second 
generation of the tree has a height equal to 2. The root has a height 
equal to 0. The height process of the tree T is the discrete function 

h{n) = height of vertex n„, < n < p — 1. (181) 

See Figure 15 for an illustration. 
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Figure 15: A labelled planar tree and its height process. 

The Lukasiewicz path of T, however, is a different process. Sup- 
pose that as we move through the tree in lexicographical order, we 
recordd the number of children ku of each vertex u. For < i < p—1, 
define 

Xi = K^-l (182) 

Thus Xi is the number of children of vertex Uj, minus 1. Define, for 
< n <p - 1, 

n-l 

So = 0,Sn = Xi ii I < n < p. (183) 

i=l 

The interpretation of this has to do with the depth-first search of the 
tree. When we explore the tree, we can partition the tree into vertices 
that are active, dead, and those not touched yet. Dead vertices are 
those which we have already examined. Active vertices are children 
of dead vertices, but we haven't explored yet their own children. 
Untouched vertices are all the rest: they are the descendants of 
active vertices. Then Sn gives us the number of active vertices at 
stage n of the lexicographical exploration of the tree: indeed, when 
we explore vertex there are k^^ new vertices to add to the list of 
active vertices, but since we are examining Un we need to subtract 
1. The path 

{so,...,Sp) (184) 

is called the Lukasiewicz path. What is the connection between the 
two processes? 
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Lemma A.l. h{n) is the number of times that, prior to time n, sj 
hit its infimum value between times j and n: 



h{n) = Card \{)<j<n-l:sj=\ni Sk\ . (185) 



j<k<n 



The reason this is true is because s only decreases when we have 
reached a leaf, which is also when h may decrease. Thus any point 
Uj such that Sj is the future infimum of its path, must be an ancestor 
of Un- See Figure 16 for an illustration. 

This a simple combinatorial lemma, but its consequences are hard 
to overstate: it tells us that /i„ may be seen as the local time at of 
the process s reflected at its infimum. 

Now, consider an offspring distribution /x on N, and consider the 
random Gallon- Watson tree T associated with the distribution 
that is, every individuals has an i.i.d. number of offsprings governed 
by the distribution ^u. We make the assumptions that 

1. /i is critical: E(L) = 1, where L ~ ^. 

2. /i has finite variance: E(L^) < oo. 

Observe that the Lukasiewicz path (5*0, Si,. . . , Sp) associated with 
T is now a random walk on Z started at 5*0 = and ended where it 
first hits level -1: 



where the Xi are i.i.d, random variables whose distribution is equal 
in law to L — 1. In particular, by assumption 1 and 2 above. 



We may consider an infinite sequence of such critical Gallon- Watson 
trees and concatenate their Luckasiewicz paths. Every time the path 
goes below the starting level, this corresponds to exploring a new 
tree. The height process representation of Lemma A.l still holds. We 
have thus encoded each tree in an infinite forest of Gallon- Watson 
trees by the excursions above the infimum of a certain random on Z 
with mean 0, finite variance jump distribution. (See Figure 16). 



n 




i=l 



E{Xi) = 0;var(Xi) < oo. 
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A. 2. 2 Convergence to reflecting Brownian motion 

From the previous discussion, it is natural to consider a Brownian 
scaling of the height process, which is the most intuitive way of 
encoding the tree. 

Theorem A. 5. As n ^ oo, there is the following convergence in 
distribution, in the sense of the Skorokhod topology onB(]R+,R); 

^^Hnut>^ ^ (^\Bt\,t>{)\ , (186) 



where o"^ = var(L) is the offspring variance. 

Proof, (sketch) This theorem is not hard to understand intuitively: 
indeed, the Lukasiewicz path, under this scahng, converges towards 
a Brownian motion with speed a"^ (being a centered random walk 
with finite variance). The main observation is then to see that 

Hn-^{Sn-In) (187) 

where In = minj^jji < n} is the running minimum. Thus it is 
natural to expect the convergence (186), since by the Levy identity 
(174), Bt — It is a reflected Brownian motion \(5t\- 

Thus it suffices to explain (187). Recall Lemma A.l; note that 
is the number of jumps of the red curve in Figure 16. 

By reversing the direction of time, we see that is also the 
number of times that the reverse path, S say, reaches a new lowest 
point (i.e., the number of jumps of the infimum process of the reverse 
walk S). Now, each time the infimum process jumps, what is the 
distribution of the overshoot Yl Let us denote by c > the mean of 
this distribution. That is, on average, every time the reverse process 
jumps downwards, it makes a jump of size c. It follows that, by the 
law of large numbers, after Hn jumps, where Hn is large, the total 
decrease in the initial position is approximately cil„. But since this 
decrease in position must be equal to Sn — In, we see that 

Sn In ~ cHn 

from which we obtain Hn ~ c^^(S'„ — /„) after division by c. Putting 
things together we deduce that 



^^Hnt,t> oj (^^\Bt\,t> o) 
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Figure 16: Concatenation of the Lukasiewicz paths of three indepen- 
dent critical Gallon- Watson trees, Ti,T2 and T3. Hn is equal to the 
number of jumps of the red curve. 

as n — > cxD. It is not too difficult to compute the expectation of 
the overshoot distribution and find that c = (T^/2. The result now 
follows. Further details can be found in Aldous [3], but see also 
Marckert and Mokkadem [117] and Le Gall and Le Jan [114]. □ 

Many corollaries follow rather easily from this asymptotic result. 
As a case in point, consider the following statement: if T is a Gallon 
Watson conditioned to reach a large level p, say, then its height 
process satisfies 



where is a Brownian excursion conditioned to reach level 1 (that 
is, a realisation of i^{-\H > 1), where v is Ito's excursion measure). 

A. 2. 3 The Continuum Random Tree 

We have seen in Theorem A. 5 that the height process, which encodes 
the genealogy of a critical Gallon- Watson trees, conditioned to ex- 
ceed a large height, converges in distribution towards a Brownian 




(188) 
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Figure 17: A simplified representation of the Brownian Continuum 
Random Tree. In reality branching occurs continuously. 

excursion conditioned to reach a corresponding height. It is natural 
to expect that, as a result, if we now view a finite tree T as a metric 
space (as we may: we just think of each edge as a segment of length 
1), then rescaling this tree suitably, the metric space T converges (in 
a suitable sense) towards a limiting metric space 0. This is indeed 
the case, and the sense of this convergence is the Gromov-HausdorfF 
metric. As this is pretty heavy machinery, we will not explain this 
construction. Instead, we will describe the limiting object (the 
Continuum Random Tree of Aldous [5] ) , and ask the reader to trust 
us that G is indeed the scaling limit of large critical Galton- Watson 
trees. More details concerning the Gromov-Hausdorff topology and 
this convergence can be found, for instance, in Evans' Saint Flour 
notes [78]. 
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We now explain the definition of the Continuum Random Tree. 
Let / : M+ — > M+ be an excursion, i.e., an element of fi*, and let 
^ be the lifetime of this excursion. We wish to think of / as the 
height function of a certain continuous tree, and that will mean the 
following: the vertices of the tree can be identified to the interval 
[0, (the time at which we visit this vertex) provided that we make 
the identification between two times s < t such that 

f{t) = f{s)= inf fin). (189) 

«G[s,t] 

Indeed, on a discrete tree T, if s and t are two times in {0, ... , |T|— 1}, 
then the length of the geodesic between Us and ut is easily seen to 
be 

h{s) + h{t) - 2 inf h{u) (190) 

ue[s,t] 

since that distance is simply the sum of two terms, which are the 
numbers of generations between Ug and v and between ut and v 
(where v is the most recent common ancestor between Us and Ut). 
However, this most recent common ancestor is precisely at height 
h{v) = mf^(.[s,t] h{u) 

Thus let ~ be the equivalence relation on [0, Q] defined by (189), 
and let 

0=[O,C]/~ (191) 

be the quotient space obtained from that relation. On the quotient 
space 6, we introduce the distance 

d{s,t) = f{s) + f{t)-2 \nif{u) 

«e[s,t] 

which is easily seen to be a distance on 9. 

Definition A. 3. The metric space {0,d) is the continuum tree de- 
rived from f. If fit) = 2et,t £ [0,1], where {et,t > 0) is the Brow- 
nian excursion conditioned so that = 1, then the random metric 
space {Q,d) derived from f is called the (standard) Continuum Ran- 
dom Tree (or CRT for short). 

To help make sense of this definition, we note that if T is a discrete 
tree, and if C{t) is the Contour process of T (i.e., the linear inter- 
polation of the process which navigates at speed 1 along the edges, 
exploring the tree in the order of depth-first search but backtracking 
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rather than jumping when it has reached a leaf) then T is isometric 
to the tree 9 derived from C{t) in Definition A. 3. 
Thus we have the following: 

1. Any time t such that f{t) is a local minimum is a branching 
point of the tree. 

2. Any time t such that f{t) is a local maximum is a leaf of the 
tree (there are in fact many other leaves). 

For us, any tree associated with a Brownian excursion (be it a 
Brownian excursion conditioned to be of duration 1 or be it an ex- 
cursion conditioned to reach above level x > for some j; > 0, for 
instance) will be called a Brownian CRT: naturally, they are related 
by a simple scaling. 

A. 3 Continuous-State Branching Processes 

A. 3.1 Feller diffusion and Ray-Knight theorem 

Come back for a moment to the critical Galton- Watson model that 
we have already introduced, with finite variance. Assume for in- 
stance that the population at some time t > is very large, say Nx 
for some x > 0. Then the population size at the next generation 
can be written as the sum of Nx i.i.d. random variable with mean 
1 (and finite variance). Thus we have, we let Nx = Zt- 



This suggests the following diffusion approximation 

Theorem A. 6. (Feller 1951 [81]) Assume Zq = N . After resettling 
(speeding up time by a faetor N ttnd dividing the total population 
size by N), the process Zj^ajN converges in the Skorokhod topology 
towards the unique in law solution of 



The diffusion (192) with = 1 is called the Feller diffusion. 



E{dZt\Tt) = 0; 



and 



vai {dZt\J^t) = cr'^Zt. 




(192) 
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The Feller diffusion is also sometimes known as the square Bessel 
process of dimension 0. However there isn't much intuition to gain 
from that connection (it is hard to imagine what a Brownian motion 
is in dimension 0). Being the scaling limit of critical Galton- Watson 
processes, Z is a continuous-state branching process (CSBP), asso- 
ciated with the branching mechanism ip{u) = Indeed, it is an 
easy exercise of stochastic calculus to check directly that the Feller 
diffusion enjoys the branching property: 

Proposition A. 2. Let Z[x) he the Feller diffusion started from Zq = 
X. Then Z has the branching property: 

Z{x + y)lz{x)+Z{y) 
where the two processes on the right-hand side are independent. 

Proof. To see this, let B and B' be two independent Brownian mo- 
tion, and consider two independent Feller diffusions Z and Z' driven 
by B and B' respectively. Then one has to show that Z + Z' also 
satisfies (192), as it is easy to check that uniqueness in distribution 
holds. However, ii Y = Z + Z' , then note that 

dYt = dZt + dZ[ 

%dBt + ^'tdB[ 
^tdWt 



where 



wt = / ^dB, + ^ds;. 







Thus is a local martingale and, since B and B' are independent, 
W has a quadratic variation equal to 



mt = 1^ ^^ds +^ds = t 



and hence, by Levy's characterisation, W \s a. Brownian motion. 
Thus Z has the branching property. □ 



We now explain the connection between this diffusion and the 
celebrated Ray-Knight theorem on the local times of Brownian mo- 
tion. Recall the setup of Theorem A. 5, where we have an infinite 
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sequence of critical Galton- Watson trees, and we showed that the 
concatenated height processes converge towards the reflecting Brow- 
nian motion after rescaling. 

Observe also that the number of visits of the height process at a 
certain level is precisely the total number of vertices at this genera- 
tion. Thus, 



In particular, if we want to consider only the N first trees Ti, . . . , T^r, 
we simply stop the height process at the time of its N^^ visit to the 
origin, or equivalently, when it has accumulated a local time at the 
origin equal to A^. The total population generated by these first A'' 
trees in the next generation, evolves precisely like a Galton- Watson 
tree started with individuals (it doesn't matter that these indi- 
viduals weren't connected to the same root earlier in time). Thus 
the Feller diffusion approximation of Theorem A. 6 holds, and given 
the above principle that the population size is the same as the local 
time of the height process, we obtain: 

Theorem A. 7. (Ray-Knight theorem for Brownian motion.) Let 
{Bt,t > 0) be a reflecting Brownian motion at 0, and let ti = inf{t > 
: > 1} where Lt is the local time at of B. If for x > we 
define 



be the total local time that B accumulates at level x before t\, then 
{Zx,x > 0) is the Feller diffusion. 

The Ray-Knight theorem (discovered simultaneously and indepen- 
dently by Ray and Knight) is actually more general than that, as 
there exists for instance a version of this result which describes the 
behaviour of L{Tx,a) as a function of a, while x > is fixed. It is 
one of the most useful tools for studying one-dimensional Brownian 
motion, and has been for instance extensively used to describe poly- 
mer models (see, e.g., [99] or [13]). Below we will see that this is 
actually a much more general statement about continuum random 
trees and continuous-state branching processes. 

A. 3.2 Height process and the CRT 

We now show how the relation between the Feller diffusion (which 
here is seen as an example of CSBP) and the standard continuum 



Local time of height process ^ Population size 




(193) 
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random tree may be generalised to other CSBPs. This generahsation 
is along the hnes of the work of Le Gall and Le Jan [114]. Essen- 
tiahy, we only touch on the surface of some fairly deep ideas that 
have been developed in the last 10 years, and about which the excel- 
lent monograph by Duquesne and Le Gall [64] give a much broader 
overview. 

Le Gall and Le Jan [114] proposed to study the rescaling of the 
height process of discrete Gallon- Watson trees whose population size 
process converges towards a given continuous-state branching pro- 
cesses. They showed indeed the existence of a scaling limit for the 
height process, which takes the following form (the one which we 
quote is a variation on Theorem 2.2.1 in [64]). 

Theorem A.8. Let Z be a fixed CSBP. Let L(^) be the offspring 
distribution in Theorem 4-5, and let cjy be the associated time-scale 
which guarantees convergence of the rescaled Galton- Watson process 
towards Z . Let H^^^ be the height process associated with an infi- 
nite sequence of i.i.d. random trees with offspring distribution L^^\ 
Then we also have convergence of the rescaled height process: 

icNH^j^^l^,t>0)^iHut>0) 

in the sense of finite- dimensional distribution. 

See Theorem 2.3.1 in [64] for a statement concerning the stronger 
convergence in the sense of the Skorokhod topology (basically, this 
convergence is proved under the condition that the CSBP becomes 
extinct (114) and a technical, non-important condition). 

It is now a good time to recall our principle "one function, one 
tree": the limiting height process {Ht,t > 0) encodes a certain Con- 
tinuum Random Tree Q, and the convergence in Theorem A. 8 en- 
sures that the corresponding rescaled trees, converge in distribution 
(in the sense of the Gromov-Hausdorff metric) towards G. This is 
a somewhat sloppy statement: for this to be true, we have to talk 
about the Galton- Watson tree conditioned to reach a large height, 
for instance, much as we did in (188). However, for this to make 
sense, we need to make sure that H is almost surely continuous. 
That turns out to be true if and only if the corresponding branching 
process becomes extinct, i.e., if and only if Grey's condition (114) 
holds. 
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Naturally, having made this definition we want to see how the 
CSBP relates to the Continuum Random Tree Q, and the answer is 
again via a Ray-Knight theorem. Thus, define: 



which is the amount of local time that H spends near x. One thing 
to realise is that it is not obvious at all why the limit (194) exists, 
as H is neither a Markov process nor a semimartingale. This limit 
is in fact shown to exist in (uniformly in t) by Duquesne and Le 
Gall in Proposition 1.3.3 of [64]. The Ray-Knight theorem in this 
setup states: 

Theorem A. 9. Let {Ht,t > 0) be the height process of a ip-CSBP. 
Let L{t,x) be its joint local time process and let {Ti,i > 0) be the 
inverse local time at x = 0. Then 



is a tp-CSBP started from Zq = z. 

The advantage of having introduced a tree to describe a CSBP is 
that it makes it possible to discuss issues related to the genealogy of 
this continuous-state branching process. For instance, the number 
of individuals at time who have descendants at time x > is equal 
to the number of excursions above that reach x > (and is thus 
finite almost surely under Grey's condition (114). 

Much as in the case of the Brownian CRT, where Ito's excursion 
measures can be used to describe the statistics of "infinitesmial trees" 
above a given level, there is a valid generalisation of excursion theory 
to height processes. This generalisation can be stated in terms of 
excursions as in Theorem A. 3 or in terms of trees. We will refrain 
from stating explicitly this result, except to say informally that the 
collection of trees generated by the height process above a certain 
level X > is a Poisson point process of trees with intensity dH. (the 
local time scale at level x) times a certain excursion measure, v. 
For instance, given L(ti,x) = the number of excursions (or trees) 
that reach level x + h is Poisson with mean ^z^(supj,>Q i/^ > h). 
Moreover this is "independent from what happened at lower levels" . 
(However, unlike in the Brownian case, it makes no sense to talk 
about excursions below x for which there is no excursion property). 




(194) 



(Zj, = L{tz,x),x > 0) 



(195) 
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We finish this section with a description of the excursion measure, 
which is the analogue to Theorem A. 4. 

Theorem A. 10. Assume Grey's condition (114). 



Naturahy, this result says exactly the same thing as Theorem 4.7 
for the lookdown process. Indeed, this can be proved directly using 
the same arguments, or can be deduced from it: it turns out that 
the notions of genealogy for {Zt, t > 0), as defined by the continuum 
random tree and by the lookdown process, are identical. Recall that 
in the world of CRT, an individual is identified with subtree below 
it, i.e., with an excursion above a certain level, and u is an ancestor 
of V if the excursion associated with v is a piece of the excursion 
associated with u. However, in the lookdown process, individuals 
are seen as levels of a countable population, and individual i at time 
s is an ancestor of individual j at time t > s if ^i(s) = (,j{t), where 
te(i)>^ ^ 0)j>i denotes the lookdown process. 

The following result was proved in [20], and shows that the two 
notions are identical, in the following sense. Let {Zt,t > 0) be a 
V'-CSBP started from Zq = r > satisfying (114), and assume that 
Zt is obtained as the local times of the height process {Ht,t < Tj) 
as in Theorem A. 9. The key point is to order the excursions above 
a certain level t suitably. We choose to rank them according to their 
supremum. That is, we denote by ej{t) the j**^ highest excursion 
above the level t. We draw a sequence of i.i.d. random variables 
{Ui)i>i uniform on (0, 1). For each j > 1, we associate to ej(0) the 
label Uj. As t increases, a given excursion may split: we decide 
that the children subexcursions each inherit the label of the parent 
excursion. We define a process £,j{t), for all j > 1 and all t > by 
saying that S,j{t) the label of ej{t). Note that when an excursion 
splits, a fairly complex transition may occur for ((^j(t),t > 0) as the 
excursions ej{t) are always ordered by their height. In fact, we have 
the following result (Theorem 14 in [20]): 

Theorem A. 11. The process {(,j{t),t > 0) is the Donnelly-Kurtz 
lookdown process associated with {Zt,t > 0). 
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